qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal
@ 2022-03-14 13:18 Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 01/10] drains: create bh only when polling Emanuele Giuseppe Esposito
                   ` (9 more replies)
  0 siblings, 10 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

This serie aims to remove and clean up some bugs that came up
when trying to replace the AioContext lock and still protect
BlockDriverState fields.

They were part of the serie "Removal of Aiocontext lock
through drains: protect bdrv_replace_child_noperm", but since
that serie is still a work in progress and these fixes are
pretty much independent, I split that in two separate series.

---
v2:
* change comments in patch 2 and 3, .attach() and .detach() callbacks.
* remove job_sleep_ns patch, as it was causing random deadlocks in test 030

Emanuele Giuseppe Esposito (10):
  drains: create bh only when polling
  bdrv_parent_drained_begin_single: handle calls from coroutine context
  block/io.c: fix bdrv_child_cb_drained_begin invocations from a
    coroutine
  block.c: bdrv_replace_child_noperm: first remove the child, and then
    call ->detach()
  block.c: bdrv_replace_child_noperm: first call ->attach(), and then
    add child
  test-bdrv-drain.c: adapt test to the coming subtree drains
  test-bdrv-drain.c: remove test_detach_by_parent_cb()
  tests/unit/test-bdrv-drain.c: graph setup functions can't run in
    coroutines
  child_job_drained_poll: override polling condition only when in home
    thread
  tests/qemu-iotests/030: test_stream_parallel should use
    auto_finalize=False

 block.c                      |  41 ++++++---
 block/io.c                   | 137 +++++++++++++++++++++++++---
 blockjob.c                   |  13 ++-
 include/block/block-io.h     |  20 +++--
 tests/qemu-iotests/030       |  12 +--
 tests/unit/test-bdrv-drain.c | 169 +++++++++++++++++++----------------
 6 files changed, 267 insertions(+), 125 deletions(-)

-- 
2.31.1



^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v2 01/10] drains: create bh only when polling
  2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
@ 2022-03-14 13:18 ` Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 02/10] bdrv_parent_drained_begin_single: handle calls from coroutine context Emanuele Giuseppe Esposito
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

We need to prevent coroutines from calling BDRV_POLL_WHILE, because
it can create deadlocks. This is done by firstly creating a bottom half
and then yielding. The bh is then scheduled in the main loop, performs
the drain and polling, and then resumes the coroutine.

The problem is that currently we create coroutine and bh regardless
on whether we eventually poll or not. There is no need to do so,
if no poll takes place.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/io.c | 31 +++++++++++++++++++------------
 1 file changed, 19 insertions(+), 12 deletions(-)

diff --git a/block/io.c b/block/io.c
index efc011ce65..4a3e8d037d 100644
--- a/block/io.c
+++ b/block/io.c
@@ -447,7 +447,7 @@ static void bdrv_do_drained_begin(BlockDriverState *bs, bool recursive,
 {
     BdrvChild *child, *next;
 
-    if (qemu_in_coroutine()) {
+    if (poll && qemu_in_coroutine()) {
         bdrv_co_yield_to_drain(bs, true, recursive, parent, ignore_bds_parents,
                                poll, NULL);
         return;
@@ -513,12 +513,6 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
     int old_quiesce_counter;
 
     assert(drained_end_counter != NULL);
-
-    if (qemu_in_coroutine()) {
-        bdrv_co_yield_to_drain(bs, false, recursive, parent, ignore_bds_parents,
-                               false, drained_end_counter);
-        return;
-    }
     assert(bs->quiesce_counter > 0);
 
     /* Re-enable things in child-to-parent order */
@@ -541,11 +535,24 @@ static void bdrv_do_drained_end(BlockDriverState *bs, bool recursive,
     }
 }
 
+static void bdrv_do_drained_end_co(BlockDriverState *bs, bool recursive,
+                                   BdrvChild *parent, bool ignore_bds_parents,
+                                   int *drained_end_counter)
+{
+    if (qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain(bs, false, recursive, parent, ignore_bds_parents,
+                               false, drained_end_counter);
+        return;
+    }
+
+    bdrv_do_drained_end(bs, recursive, parent, ignore_bds_parents, drained_end_counter);
+}
+
 void bdrv_drained_end(BlockDriverState *bs)
 {
     int drained_end_counter = 0;
     IO_OR_GS_CODE();
-    bdrv_do_drained_end(bs, false, NULL, false, &drained_end_counter);
+    bdrv_do_drained_end_co(bs, false, NULL, false, &drained_end_counter);
     BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
 }
 
@@ -559,7 +566,7 @@ void bdrv_subtree_drained_end(BlockDriverState *bs)
 {
     int drained_end_counter = 0;
     IO_OR_GS_CODE();
-    bdrv_do_drained_end(bs, true, NULL, false, &drained_end_counter);
+    bdrv_do_drained_end_co(bs, true, NULL, false, &drained_end_counter);
     BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
 }
 
@@ -580,7 +587,7 @@ void bdrv_unapply_subtree_drain(BdrvChild *child, BlockDriverState *old_parent)
     IO_OR_GS_CODE();
 
     for (i = 0; i < old_parent->recursive_quiesce_counter; i++) {
-        bdrv_do_drained_end(child->bs, true, child, false,
+        bdrv_do_drained_end_co(child->bs, true, child, false,
                             &drained_end_counter);
     }
 
@@ -703,7 +710,7 @@ void bdrv_drain_all_end_quiesce(BlockDriverState *bs)
     g_assert(!bs->refcnt);
 
     while (bs->quiesce_counter) {
-        bdrv_do_drained_end(bs, false, NULL, true, &drained_end_counter);
+        bdrv_do_drained_end_co(bs, false, NULL, true, &drained_end_counter);
     }
     BDRV_POLL_WHILE(bs, qatomic_read(&drained_end_counter) > 0);
 }
@@ -727,7 +734,7 @@ void bdrv_drain_all_end(void)
         AioContext *aio_context = bdrv_get_aio_context(bs);
 
         aio_context_acquire(aio_context);
-        bdrv_do_drained_end(bs, false, NULL, true, &drained_end_counter);
+        bdrv_do_drained_end_co(bs, false, NULL, true, &drained_end_counter);
         aio_context_release(aio_context);
     }
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 02/10] bdrv_parent_drained_begin_single: handle calls from coroutine context
  2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 01/10] drains: create bh only when polling Emanuele Giuseppe Esposito
@ 2022-03-14 13:18 ` Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 03/10] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine Emanuele Giuseppe Esposito
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

bdrv_parent_drained_begin_single() is also called by
bdrv_replace_child_noperm(). The latter is often called
from coroutines, for example in bdrv_co_create_opts() callbacks.

This can potentially create deadlocks, because if the drain_saldo
in bdrv_replace_child_noperm is > 0, the coroutine will start
polling using BDRV_POLL_WHILE. Right now this does not seem
to happen, but if additional drains are used in future,
this will be much more likely to happen.

Fix the problem by doing something very similar to
bdrv_do_drained_begin(): if in coroutine, schedule a bh
to execute the drain in the main loop, and enter the coroutine
only once it is done.

Just as the other drains, check the coroutine case only when
effectively polling.

As a consequence of this, remove the coroutine assertion in
bdrv_do_drained_begin_quiesce. We are never polling in that case.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block/io.c | 98 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 97 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 4a3e8d037d..e446782ae0 100644
--- a/block/io.c
+++ b/block/io.c
@@ -67,10 +67,101 @@ static void bdrv_parent_drained_end_single_no_poll(BdrvChild *c,
     }
 }
 
+typedef struct {
+    Coroutine *co;
+    BdrvChild *child;
+    bool done;
+    bool begin;
+    bool poll;
+} BdrvCoDrainParentData;
+
+static void bdrv_co_drain_parent_bh_cb(void *opaque)
+{
+    BdrvCoDrainParentData *data = opaque;
+    Coroutine *co = data->co;
+    BdrvChild *child = data->child;
+    BlockDriverState *bs = child->bs;
+    AioContext *ctx = bdrv_get_aio_context(bs);
+
+    if (bs) {
+        aio_context_acquire(ctx);
+        bdrv_dec_in_flight(bs);
+    }
+
+    if (data->begin) {
+        bdrv_parent_drained_begin_single(child, data->poll);
+    } else {
+        assert(!data->poll);
+        bdrv_parent_drained_end_single(child);
+    }
+
+    if (bs) {
+        aio_context_release(ctx);
+    }
+
+    data->done = true;
+    aio_co_wake(co);
+}
+
+static void coroutine_fn bdrv_co_yield_to_drain_parent(BdrvChild *c,
+                                                       bool begin, bool poll)
+{
+    BdrvCoDrainParentData data;
+    Coroutine *self = qemu_coroutine_self();
+    BlockDriverState *bs = c->bs;
+    AioContext *ctx = bdrv_get_aio_context(bs);
+    AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
+
+    /* Calling bdrv_drain() from a BH ensures the current coroutine yields and
+     * other coroutines run if they were queued by aio_co_enter(). */
+
+    assert(qemu_in_coroutine());
+    data = (BdrvCoDrainParentData) {
+        .co = self,
+        .child = c,
+        .done = false,
+        .begin = begin,
+        .poll = poll,
+    };
+
+    if (bs) {
+        bdrv_inc_in_flight(bs);
+    }
+
+    /*
+     * Temporarily drop the lock across yield or we would get deadlocks.
+     * bdrv_co_yield_to_drain_parent() reaquires the lock as needed.
+     *
+     * When we yield below, the lock for the current context will be
+     * released, so if this is actually the lock that protects bs, don't drop
+     * it a second time.
+     */
+    if (ctx != co_ctx) {
+        aio_context_release(ctx);
+    }
+    replay_bh_schedule_oneshot_event(ctx, bdrv_co_drain_parent_bh_cb, &data);
+
+    qemu_coroutine_yield();
+    /* If we are resumed from some other event (such as an aio completion or a
+     * timer callback), it is a bug in the caller that should be fixed. */
+    assert(data.done);
+
+    /* Reaquire the AioContext of bs if we dropped it */
+    if (ctx != co_ctx) {
+        aio_context_acquire(ctx);
+    }
+}
+
 void bdrv_parent_drained_end_single(BdrvChild *c)
 {
     int drained_end_counter = 0;
     IO_OR_GS_CODE();
+
+    if (qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain_parent(c, false, false);
+        return;
+    }
+
     bdrv_parent_drained_end_single_no_poll(c, &drained_end_counter);
     BDRV_POLL_WHILE(c->bs, qatomic_read(&drained_end_counter) > 0);
 }
@@ -116,6 +207,12 @@ static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore,
 void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
 {
     IO_OR_GS_CODE();
+
+    if (poll && qemu_in_coroutine()) {
+        bdrv_co_yield_to_drain_parent(c, true, poll);
+        return;
+    }
+
     c->parent_quiesce_counter++;
     if (c->klass->drained_begin) {
         c->klass->drained_begin(c);
@@ -430,7 +527,6 @@ void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
                                    BdrvChild *parent, bool ignore_bds_parents)
 {
     IO_OR_GS_CODE();
-    assert(!qemu_in_coroutine());
 
     /* Stop things in parent-to-child order */
     if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) {
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 03/10] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine
  2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 01/10] drains: create bh only when polling Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 02/10] bdrv_parent_drained_begin_single: handle calls from coroutine context Emanuele Giuseppe Esposito
@ 2022-03-14 13:18 ` Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 04/10] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

Using bdrv_do_drained_begin_quiesce() in bdrv_child_cb_drained_begin()
is not a good idea: the callback might be called when running
a drain in a coroutine, and bdrv_drained_begin_poll() does not
handle that case, resulting in assertion failure.

Instead, bdrv_do_drained_begin with no recursion and poll
will accomplish the same thing (invoking bdrv_do_drained_begin_quiesce)
but will firstly check if we are already in a coroutine, and exit
from that via bdrv_co_yield_to_drain().

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c                  |  2 +-
 block/io.c               |  8 +++++++-
 include/block/block-io.h | 20 +++++++++++---------
 3 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/block.c b/block.c
index 718e4cae8b..372f16f4a0 100644
--- a/block.c
+++ b/block.c
@@ -1211,7 +1211,7 @@ static char *bdrv_child_get_parent_desc(BdrvChild *c)
 static void bdrv_child_cb_drained_begin(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
-    bdrv_do_drained_begin_quiesce(bs, NULL, false);
+    bdrv_drained_begin_no_poll(bs);
 }
 
 static bool bdrv_child_cb_drained_poll(BdrvChild *child)
diff --git a/block/io.c b/block/io.c
index e446782ae0..e77861c464 100644
--- a/block/io.c
+++ b/block/io.c
@@ -523,7 +523,7 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
     }
 }
 
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
+static void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
                                    BdrvChild *parent, bool ignore_bds_parents)
 {
     IO_OR_GS_CODE();
@@ -587,6 +587,12 @@ void bdrv_subtree_drained_begin(BlockDriverState *bs)
     bdrv_do_drained_begin(bs, true, NULL, false, true);
 }
 
+void bdrv_drained_begin_no_poll(BlockDriverState *bs)
+{
+    IO_CODE();
+    bdrv_do_drained_begin(bs, false, NULL, false, false);
+}
+
 /**
  * This function does not poll, nor must any of its recursively called
  * functions.  The *drained_end_counter pointee will be incremented
diff --git a/include/block/block-io.h b/include/block/block-io.h
index 5e3f346806..9135b648bf 100644
--- a/include/block/block-io.h
+++ b/include/block/block-io.h
@@ -226,6 +226,17 @@ int coroutine_fn bdrv_co_copy_range(BdrvChild *src, int64_t src_offset,
                                     int64_t bytes, BdrvRequestFlags read_flags,
                                     BdrvRequestFlags write_flags);
 
+/**
+ * bdrv_drained_begin_no_poll:
+ *
+ * Quiesces a BDS like bdrv_drained_begin(), but does not wait for already
+ * running requests to complete.
+ * Same as bdrv_drained_begin(), but do not poll for the subgraph to
+ * actually become unquiesced. Therefore, no graph changes will occur
+ * with this function.
+ */
+void bdrv_drained_begin_no_poll(BlockDriverState *bs);
+
 /**
  * bdrv_drained_end_no_poll:
  *
@@ -332,15 +343,6 @@ bool bdrv_drain_poll(BlockDriverState *bs, bool recursive,
  */
 void bdrv_drained_begin(BlockDriverState *bs);
 
-/**
- * bdrv_do_drained_begin_quiesce:
- *
- * Quiesces a BDS like bdrv_drained_begin(), but does not wait for already
- * running requests to complete.
- */
-void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,
-                                   BdrvChild *parent, bool ignore_bds_parents);
-
 /**
  * Like bdrv_drained_begin, but recursively begins a quiesced section for
  * exclusive access to all child nodes as well.
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 04/10] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach()
  2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
                   ` (2 preceding siblings ...)
  2022-03-14 13:18 ` [PATCH v2 03/10] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine Emanuele Giuseppe Esposito
@ 2022-03-14 13:18 ` Emanuele Giuseppe Esposito
  2022-03-16  9:13   ` Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 05/10] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
                   ` (5 subsequent siblings)
  9 siblings, 1 reply; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

Doing the opposite can make ->detach() (more precisely
bdrv_unapply_subtree_drain() in bdrv_child_cb_detach) undo the subtree_drain
just performed to protect the removal of the child from the graph,
thus making the fully-enabled assert_bdrv_graph_writable fail.

Note that assert_bdrv_graph_writable is not yet fully enabled.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index 372f16f4a0..d870ba5393 100644
--- a/block.c
+++ b/block.c
@@ -1448,6 +1448,11 @@ static void bdrv_child_cb_attach(BdrvChild *child)
     bdrv_apply_subtree_drain(child, bs);
 }
 
+/*
+ * Unapply the drain in the whole child subtree, as
+ * it has been already detached, and then remove from
+ * child->opaque->children.
+ */
 static void bdrv_child_cb_detach(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
@@ -2864,14 +2869,18 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
     }
 
     if (old_bs) {
-        /* Detach first so that the recursive drain sections coming from @child
-         * are already gone and we only end the drain sections that came from
-         * elsewhere. */
+        /* First remove child from child->bs->parents list */
+        assert_bdrv_graph_writable(old_bs);
+        QLIST_REMOVE(child, next_parent);
+        /*
+         * Then call ->detach() cb.
+         * In child_of_bds case, update the child parent
+         * (child->opaque) ->children list and
+         * remove any drain left in the child subtree.
+         */
         if (child->klass->detach) {
             child->klass->detach(child);
         }
-        assert_bdrv_graph_writable(old_bs);
-        QLIST_REMOVE(child, next_parent);
     }
 
     child->bs = new_bs;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 05/10] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child
  2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
                   ` (3 preceding siblings ...)
  2022-03-14 13:18 ` [PATCH v2 04/10] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
@ 2022-03-14 13:18 ` Emanuele Giuseppe Esposito
  2022-03-16  9:16   ` Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 06/10] test-bdrv-drain.c: adapt test to support additional subtree drains Emanuele Giuseppe Esposito
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

Doing the opposite can make adding the child node to a non-drained node,
as apply_subtree_drain is only done in ->attach() and thus make
assert_bdrv_graph_writable fail.

This can happen for example during a transaction rollback (test 245,
test_io_with_graph_changes):
1. a node is removed from the graph, thus it is undrained
2. then something happens, and we need to roll back the transactions
   through tran_abort()
3. at this point, the current code would first attach the undrained node
   to the graph via QLIST_INSERT_HEAD, and then call ->attach() that
   will take care of restoring the drain with apply_subtree_drain(),
   leaving the node undrained between the two operations.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 block.c | 20 +++++++++++++++-----
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/block.c b/block.c
index d870ba5393..c6a550f9c6 100644
--- a/block.c
+++ b/block.c
@@ -1434,6 +1434,11 @@ static void bdrv_inherited_options(BdrvChildRole role, bool parent_is_format,
     *child_flags = flags;
 }
 
+/*
+ * Add the child node to child->opaque->children list,
+ * and then apply the drain to the whole child subtree,
+ * so that the drain count matches with the parent.
+ */
 static void bdrv_child_cb_attach(BdrvChild *child)
 {
     BlockDriverState *bs = child->opaque;
@@ -2889,8 +2894,6 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
     }
 
     if (new_bs) {
-        assert_bdrv_graph_writable(new_bs);
-        QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
 
         /*
          * Detaching the old node may have led to the new node's
@@ -2901,12 +2904,19 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
         assert(new_bs->quiesce_counter <= new_bs_quiesce_counter);
         drain_saldo += new_bs->quiesce_counter - new_bs_quiesce_counter;
 
-        /* Attach only after starting new drained sections, so that recursive
-         * drain sections coming from @child don't get an extra .drained_begin
-         * callback. */
+        /*
+         * First call ->attach() cb.
+         * In child_of_bds case, add child to the parent
+         * (child->opaque) ->children list and if
+         * necessary add missing drains in the child subtree.
+         */
         if (child->klass->attach) {
             child->klass->attach(child);
         }
+
+        /* Then add child to new_bs->parents list */
+        assert_bdrv_graph_writable(new_bs);
+        QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
     }
 
     /*
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 06/10] test-bdrv-drain.c: adapt test to support additional subtree drains
  2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
                   ` (4 preceding siblings ...)
  2022-03-14 13:18 ` [PATCH v2 05/10] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
@ 2022-03-14 13:18 ` Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 07/10] test-bdrv-drain.c: remove test_detach_by_parent_cb() Emanuele Giuseppe Esposito
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

There will be a problem in this test when we will add
subtree drains in bdrv_replace_child_noperm:

test_detach_indirect is only interested in observing the first
call to .drained_begin. In the original test, there was only a single
subtree drain; however, with additional drains introduced in
bdrv_replace_child_noperm(), the test callback would be called too early
and/or multiple times.
Override the callback only when we actually want to use it, and put back
the original after it's been invoked.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 tests/unit/test-bdrv-drain.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 36be84ae55..f750ddfc4e 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1320,15 +1320,18 @@ static void detach_by_parent_aio_cb(void *opaque, int ret)
     }
 }
 
+static BdrvChildClass detach_by_driver_cb_class;
+
 static void detach_by_driver_cb_drained_begin(BdrvChild *child)
 {
+    /* restore .drained_begin cb, we don't need it anymore. */
+    detach_by_driver_cb_class.drained_begin = child_of_bds.drained_begin;
+
     aio_bh_schedule_oneshot(qemu_get_current_aio_context(),
                             detach_indirect_bh, &detach_by_parent_data);
     child_of_bds.drained_begin(child);
 }
 
-static BdrvChildClass detach_by_driver_cb_class;
-
 /*
  * Initial graph:
  *
@@ -1360,8 +1363,6 @@ static void test_detach_indirect(bool by_parent_cb)
 
     if (!by_parent_cb) {
         detach_by_driver_cb_class = child_of_bds;
-        detach_by_driver_cb_class.drained_begin =
-            detach_by_driver_cb_drained_begin;
     }
 
     /* Create all involved nodes */
@@ -1419,6 +1420,12 @@ static void test_detach_indirect(bool by_parent_cb)
     acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
     g_assert(acb != NULL);
 
+    if (!by_parent_cb) {
+        /* set .drained_begin cb to run only in the following drain. */
+        detach_by_driver_cb_class.drained_begin =
+            detach_by_driver_cb_drained_begin;
+    }
+
     /* Drain and check the expected result */
     bdrv_subtree_drained_begin(parent_b);
 
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 07/10] test-bdrv-drain.c: remove test_detach_by_parent_cb()
  2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
                   ` (5 preceding siblings ...)
  2022-03-14 13:18 ` [PATCH v2 06/10] test-bdrv-drain.c: adapt test to support additional subtree drains Emanuele Giuseppe Esposito
@ 2022-03-14 13:18 ` Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 08/10] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines Emanuele Giuseppe Esposito
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

This test uses a callback of an I/O function (blk_aio_preadv)
to modify the graph, using bdrv_attach_child.
This is simply not allowed anymore. I/O cannot change the graph.

Before "block/io.c: make bdrv_do_drained_begin_quiesce static
and introduce bdrv_drained_begin_no_poll", the test would simply
be at risk of failure, because if bdrv_replace_child_noperm()
(called to modify the graph) would call a drain,
then one callback of .drained_begin() is bdrv_do_drained_begin_quiesce,
that specifically asserts that we are not in a coroutine.

Now that we fixed the behavior, the drain will invoke a bh in the
main loop, so we don't have such problem. However, this test is still
illegal and fails because we forbid graph changes from I/O paths.

Once we add the required subtree_drains to protect
bdrv_replace_child_noperm(), the key problem in this test is in:

acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
/* Drain and check the expected result */
bdrv_subtree_drained_begin(parent_b);

because the detach_by_parent_aio_cb calls detach_indirect_bh(), that
modifies the graph and is invoked during bdrv_subtree_drained_begin().
The call stack is the following:
1. blk_aio_preadv() creates a coroutine, increments in_flight counter
and enters the coroutine running blk_aio_read_entry()
2. blk_aio_read_entry() performs the read and then schedules a bh to
   complete (blk_aio_complete)
3. at this point, subtree_drained_begin() kicks in and waits for all
   in_flight requests, polling
4. polling allows the bh to be scheduled, so blk_aio_complete runs
5. blk_aio_complete *first* invokes the callback
   (detach_by_parent_aio_cb) and then decrements the in_flight counter
6. Here we have the problem: detach_by_parent_aio_cb modifies the graph,
   so both bdrv_unref_child() and bdrv_attach_child() will have
   subtree_drains inside. And this causes a deadlock, because the
   nested drain will wait for in_flight counter to go to zero, which
   is only happening once the drain itself finishes.

Different story is test_detach_by_driver_cb(): in this case,
detach_by_parent_aio_cb() does not call detach_indirect_bh(),
but it is instead called as a bh running in the main loop by
detach_by_driver_cb_drained_begin(), the callback for
.drained_begin().

This test was added in 231281ab42 and part of the series
"Drain fixes and cleanups, part 3"
https://lists.nongnu.org/archive/html/qemu-block/2018-05/msg01132.html
but as explained above I believe that it is not valid anymore, and
can be discarded.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/unit/test-bdrv-drain.c | 46 +++++++++---------------------------
 1 file changed, 11 insertions(+), 35 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index f750ddfc4e..fa0d86209a 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1294,7 +1294,6 @@ struct detach_by_parent_data {
     BdrvChild *child_b;
     BlockDriverState *c;
     BdrvChild *child_c;
-    bool by_parent_cb;
 };
 static struct detach_by_parent_data detach_by_parent_data;
 
@@ -1312,12 +1311,7 @@ static void detach_indirect_bh(void *opaque)
 
 static void detach_by_parent_aio_cb(void *opaque, int ret)
 {
-    struct detach_by_parent_data *data = &detach_by_parent_data;
-
     g_assert_cmpint(ret, ==, 0);
-    if (data->by_parent_cb) {
-        detach_indirect_bh(data);
-    }
 }
 
 static BdrvChildClass detach_by_driver_cb_class;
@@ -1339,31 +1333,24 @@ static void detach_by_driver_cb_drained_begin(BdrvChild *child)
  *    \ /   \
  *     A     B     C
  *
- * by_parent_cb == true:  Test that parent callbacks don't poll
- *
- *     PA has a pending write request whose callback changes the child nodes of
- *     PB: It removes B and adds C instead. The subtree of PB is drained, which
- *     will indirectly drain the write request, too.
- *
- * by_parent_cb == false: Test that bdrv_drain_invoke() doesn't poll
+ * Test that bdrv_drain_invoke() doesn't poll
  *
  *     PA's BdrvChildClass has a .drained_begin callback that schedules a BH
  *     that does the same graph change. If bdrv_drain_invoke() calls it, the
  *     state is messed up, but if it is only polled in the single
  *     BDRV_POLL_WHILE() at the end of the drain, this should work fine.
  */
-static void test_detach_indirect(bool by_parent_cb)
+static void test_detach_indirect(void)
 {
     BlockBackend *blk;
     BlockDriverState *parent_a, *parent_b, *a, *b, *c;
     BdrvChild *child_a, *child_b;
     BlockAIOCB *acb;
+    BDRVTestState *s;
 
     QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, 0);
 
-    if (!by_parent_cb) {
-        detach_by_driver_cb_class = child_of_bds;
-    }
+    detach_by_driver_cb_class = child_of_bds;
 
     /* Create all involved nodes */
     parent_a = bdrv_new_open_driver(&bdrv_test, "parent-a", BDRV_O_RDWR,
@@ -1382,10 +1369,8 @@ static void test_detach_indirect(bool by_parent_cb)
 
     /* If we want to get bdrv_drain_invoke() to call aio_poll(), the driver
      * callback must not return immediately. */
-    if (!by_parent_cb) {
-        BDRVTestState *s = parent_a->opaque;
-        s->sleep_in_drain_begin = true;
-    }
+    s = parent_a->opaque;
+    s->sleep_in_drain_begin = true;
 
     /* Set child relationships */
     bdrv_ref(b);
@@ -1397,7 +1382,7 @@ static void test_detach_indirect(bool by_parent_cb)
 
     bdrv_ref(a);
     bdrv_attach_child(parent_a, a, "PA-A",
-                      by_parent_cb ? &child_of_bds : &detach_by_driver_cb_class,
+                      &detach_by_driver_cb_class,
                       BDRV_CHILD_DATA, &error_abort);
 
     g_assert_cmpint(parent_a->refcnt, ==, 1);
@@ -1415,16 +1400,13 @@ static void test_detach_indirect(bool by_parent_cb)
         .parent_b = parent_b,
         .child_b = child_b,
         .c = c,
-        .by_parent_cb = by_parent_cb,
     };
     acb = blk_aio_preadv(blk, 0, &qiov, 0, detach_by_parent_aio_cb, NULL);
     g_assert(acb != NULL);
 
-    if (!by_parent_cb) {
-        /* set .drained_begin cb to run only in the following drain. */
-        detach_by_driver_cb_class.drained_begin =
-            detach_by_driver_cb_drained_begin;
-    }
+    /* set .drained_begin cb to run only in the following drain. */
+    detach_by_driver_cb_class.drained_begin =
+        detach_by_driver_cb_drained_begin;
 
     /* Drain and check the expected result */
     bdrv_subtree_drained_begin(parent_b);
@@ -1460,14 +1442,9 @@ static void test_detach_indirect(bool by_parent_cb)
     bdrv_unref(c);
 }
 
-static void test_detach_by_parent_cb(void)
-{
-    test_detach_indirect(true);
-}
-
 static void test_detach_by_driver_cb(void)
 {
-    test_detach_indirect(false);
+    test_detach_indirect();
 }
 
 static void test_append_to_drained(void)
@@ -2222,7 +2199,6 @@ int main(int argc, char **argv)
     g_test_add_func("/bdrv-drain/detach/drain_all", test_detach_by_drain_all);
     g_test_add_func("/bdrv-drain/detach/drain", test_detach_by_drain);
     g_test_add_func("/bdrv-drain/detach/drain_subtree", test_detach_by_drain_subtree);
-    g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb);
     g_test_add_func("/bdrv-drain/detach/driver_cb", test_detach_by_driver_cb);
 
     g_test_add_func("/bdrv-drain/attach/drain", test_append_to_drained);
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 08/10] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines
  2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
                   ` (6 preceding siblings ...)
  2022-03-14 13:18 ` [PATCH v2 07/10] test-bdrv-drain.c: remove test_detach_by_parent_cb() Emanuele Giuseppe Esposito
@ 2022-03-14 13:18 ` Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 09/10] child_job_drained_poll: override polling condition only when in home thread Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 10/10] tests/qemu-iotests/030: test_stream_parallel should use auto_finalize=False Emanuele Giuseppe Esposito
  9 siblings, 0 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

Graph initialization functions like blk_new(), bdrv_new() and so on
should not run in a coroutine. In fact, they might invoke a drain
(for example blk_insert_bs eventually calls bdrv_replace_child_noperm)
that in turn can invoke callbacks like bdrv_do_drained_begin_quiesce(),
that asserts exactly that we are not in a coroutine.

Move the initialization phase of test_drv_cb and test_quiesce_common
outside the coroutine logic.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Message-Id: <20211213104014.69858-2-eesposit@redhat.com>
---
 tests/unit/test-bdrv-drain.c | 118 ++++++++++++++++++++++-------------
 1 file changed, 73 insertions(+), 45 deletions(-)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index fa0d86209a..a3bc19965d 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -116,7 +116,8 @@ static void aio_ret_cb(void *opaque, int ret)
 }
 
 typedef struct CallInCoroutineData {
-    void (*entry)(void);
+    void (*entry)(void *);
+    void *arg;
     bool done;
 } CallInCoroutineData;
 
@@ -124,15 +125,16 @@ static coroutine_fn void call_in_coroutine_entry(void *opaque)
 {
     CallInCoroutineData *data = opaque;
 
-    data->entry();
+    data->entry(data->arg);
     data->done = true;
 }
 
-static void call_in_coroutine(void (*entry)(void))
+static void call_in_coroutine(void (*entry)(void *), void *arg)
 {
     Coroutine *co;
     CallInCoroutineData data = {
         .entry  = entry,
+        .arg    = arg,
         .done   = false,
     };
 
@@ -192,26 +194,28 @@ static void do_drain_end_unlocked(enum drain_type drain_type, BlockDriverState *
     }
 }
 
-static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
-{
+typedef struct TestDriverCBData {
+    enum drain_type drain_type;
+    bool recursive;
     BlockBackend *blk;
     BlockDriverState *bs, *backing;
-    BDRVTestState *s, *backing_s;
+} TestDriverCBData;
+
+static void test_drv_cb_common(void *arg)
+{
+    TestDriverCBData *data = arg;
+    BlockBackend *blk = data->blk;
+    BlockDriverState *bs = data->bs;
+    BlockDriverState *backing = data->backing;
+    enum drain_type drain_type = data->drain_type;
+    bool recursive = data->recursive;
+    BDRVTestState *s = bs->opaque;
+    BDRVTestState *backing_s = backing->opaque;
     BlockAIOCB *acb;
     int aio_ret;
 
     QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, 0);
 
-    blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
-    bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
-                              &error_abort);
-    s = bs->opaque;
-    blk_insert_bs(blk, bs, &error_abort);
-
-    backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
-    backing_s = backing->opaque;
-    bdrv_set_backing_hd(bs, backing, &error_abort);
-
     /* Simple bdrv_drain_all_begin/end pair, check that CBs are called */
     g_assert_cmpint(s->drain_count, ==, 0);
     g_assert_cmpint(backing_s->drain_count, ==, 0);
@@ -245,54 +249,77 @@ static void test_drv_cb_common(enum drain_type drain_type, bool recursive)
 
     g_assert_cmpint(s->drain_count, ==, 0);
     g_assert_cmpint(backing_s->drain_count, ==, 0);
+}
 
-    bdrv_unref(backing);
-    bdrv_unref(bs);
-    blk_unref(blk);
+static void test_common_cb(enum drain_type drain_type, bool in_coroutine,
+                           void (*cb)(void *))
+{
+    TestDriverCBData data;
+
+    data.drain_type = drain_type;
+    data.recursive = (drain_type != BDRV_DRAIN);
+
+    data.blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
+    data.bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
+                              &error_abort);
+    blk_insert_bs(data.blk, data.bs, &error_abort);
+
+    data.backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
+    bdrv_set_backing_hd(data.bs, data.backing, &error_abort);
+
+    if (in_coroutine) {
+        call_in_coroutine(cb, &data);
+    } else {
+        cb(&data);
+    }
+
+    bdrv_unref(data.backing);
+    bdrv_unref(data.bs);
+    blk_unref(data.blk);
+}
+
+static void test_drv_cb(enum drain_type drain_type, bool in_coroutine)
+{
+    test_common_cb(drain_type, in_coroutine, test_drv_cb_common);
 }
 
 static void test_drv_cb_drain_all(void)
 {
-    test_drv_cb_common(BDRV_DRAIN_ALL, true);
+    test_drv_cb(BDRV_DRAIN_ALL, false);
 }
 
 static void test_drv_cb_drain(void)
 {
-    test_drv_cb_common(BDRV_DRAIN, false);
+    test_drv_cb(BDRV_DRAIN, false);
 }
 
 static void test_drv_cb_drain_subtree(void)
 {
-    test_drv_cb_common(BDRV_SUBTREE_DRAIN, true);
+    test_drv_cb(BDRV_SUBTREE_DRAIN, false);
 }
 
 static void test_drv_cb_co_drain_all(void)
 {
-    call_in_coroutine(test_drv_cb_drain_all);
+    test_drv_cb(BDRV_DRAIN_ALL, true);
 }
 
 static void test_drv_cb_co_drain(void)
 {
-    call_in_coroutine(test_drv_cb_drain);
+    test_drv_cb(BDRV_DRAIN, true);
 }
 
 static void test_drv_cb_co_drain_subtree(void)
 {
-    call_in_coroutine(test_drv_cb_drain_subtree);
+    test_drv_cb(BDRV_SUBTREE_DRAIN, true);
 }
 
-static void test_quiesce_common(enum drain_type drain_type, bool recursive)
+static void test_quiesce_common(void *arg)
 {
-    BlockBackend *blk;
-    BlockDriverState *bs, *backing;
-
-    blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
-    bs = bdrv_new_open_driver(&bdrv_test, "test-node", BDRV_O_RDWR,
-                              &error_abort);
-    blk_insert_bs(blk, bs, &error_abort);
-
-    backing = bdrv_new_open_driver(&bdrv_test, "backing", 0, &error_abort);
-    bdrv_set_backing_hd(bs, backing, &error_abort);
+    TestDriverCBData *data = arg;
+    BlockDriverState *bs = data->bs;
+    BlockDriverState *backing = data->backing;
+    enum drain_type drain_type = data->drain_type;
+    bool recursive = data->recursive;
 
     g_assert_cmpint(bs->quiesce_counter, ==, 0);
     g_assert_cmpint(backing->quiesce_counter, ==, 0);
@@ -306,40 +333,41 @@ static void test_quiesce_common(enum drain_type drain_type, bool recursive)
 
     g_assert_cmpint(bs->quiesce_counter, ==, 0);
     g_assert_cmpint(backing->quiesce_counter, ==, 0);
+}
 
-    bdrv_unref(backing);
-    bdrv_unref(bs);
-    blk_unref(blk);
+static void test_quiesce(enum drain_type drain_type, bool in_coroutine)
+{
+    test_common_cb(drain_type, in_coroutine, test_quiesce_common);
 }
 
 static void test_quiesce_drain_all(void)
 {
-    test_quiesce_common(BDRV_DRAIN_ALL, true);
+    test_quiesce(BDRV_DRAIN_ALL, false);
 }
 
 static void test_quiesce_drain(void)
 {
-    test_quiesce_common(BDRV_DRAIN, false);
+    test_quiesce(BDRV_DRAIN, false);
 }
 
 static void test_quiesce_drain_subtree(void)
 {
-    test_quiesce_common(BDRV_SUBTREE_DRAIN, true);
+    test_quiesce(BDRV_SUBTREE_DRAIN, false);
 }
 
 static void test_quiesce_co_drain_all(void)
 {
-    call_in_coroutine(test_quiesce_drain_all);
+    test_quiesce(BDRV_DRAIN_ALL, true);
 }
 
 static void test_quiesce_co_drain(void)
 {
-    call_in_coroutine(test_quiesce_drain);
+    test_quiesce(BDRV_DRAIN, true);
 }
 
 static void test_quiesce_co_drain_subtree(void)
 {
-    call_in_coroutine(test_quiesce_drain_subtree);
+    test_quiesce(BDRV_SUBTREE_DRAIN, true);
 }
 
 static void test_nested(void)
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 09/10] child_job_drained_poll: override polling condition only when in home thread
  2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
                   ` (7 preceding siblings ...)
  2022-03-14 13:18 ` [PATCH v2 08/10] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines Emanuele Giuseppe Esposito
@ 2022-03-14 13:18 ` Emanuele Giuseppe Esposito
  2022-03-14 13:18 ` [PATCH v2 10/10] tests/qemu-iotests/030: test_stream_parallel should use auto_finalize=False Emanuele Giuseppe Esposito
  9 siblings, 0 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

drv->drained_poll() is only implemented in mirror, and allows
it to drain from within the coroutine. The mirror implementation uses
in_drain flag to recognize when it is draining from coroutine,
and consequently avoid deadlocking (wait the poll condition in
child_job_drained_poll to wait for itself).

The problem is that this flag is dangerous, because it breaks
bdrv_drained_begin() invariants: once drained_begin ends, all
jobs, in_flight requests, and anything running in the iothread
are blocked.

This can be broken in such way:
iothread(mirror): s->in_drain = true; // mirror.c:1112
main loop: bdrv_drained_begin(mirror_bs);
/*
 * drained_begin wait for bdrv_drain_poll_top_level() condition,
 * that translates in child_job_drained_poll() for jobs, but
 * mirror implements drv->drained_poll() so it returns
 * !!in_flight_requests, which his 0 (assertion in mirror.c:1105).
 */
main loop: thinks iothread is stopped and is modifying the graph...
iothread(mirror): *continues*, as nothing is stopping it
iothread(mirror): bdrv_drained_begin(bs);
/* draining reads the graph while it is modified!! */
main loop: done modifying the graph...

In order to fix this, we can simply allow drv->drained_poll()
to be called only by the iothread, and not the main loop.
We distinguish it by using in_aio_context_home_thread(), that
returns false if @ctx is not the same as the thread that runs it.

Co-Developed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
---
 blockjob.c | 13 ++++++++++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 4868453d74..14a919b3cc 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -110,7 +110,9 @@ static bool child_job_drained_poll(BdrvChild *c)
     BlockJob *bjob = c->opaque;
     Job *job = &bjob->job;
     const BlockJobDriver *drv = block_job_driver(bjob);
+    AioContext *ctx;
 
+    ctx = job->aio_context;
     /* An inactive or completed job doesn't have any pending requests. Jobs
      * with !job->busy are either already paused or have a pause point after
      * being reentered, so no job driver code will run before they pause. */
@@ -118,9 +120,14 @@ static bool child_job_drained_poll(BdrvChild *c)
         return false;
     }
 
-    /* Otherwise, assume that it isn't fully stopped yet, but allow the job to
-     * override this assumption. */
-    if (drv->drained_poll) {
+    /*
+     * Otherwise, assume that it isn't fully stopped yet, but allow the job to
+     * override this assumption, if the drain is being performed in the
+     * iothread. We need to check that the caller is the home thread because
+     * it could otherwise lead the main loop to exit polling while the job
+     * has not paused yet.
+     */
+    if (in_aio_context_home_thread(ctx) && drv->drained_poll) {
         return drv->drained_poll(bjob);
     } else {
         return true;
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v2 10/10] tests/qemu-iotests/030: test_stream_parallel should use auto_finalize=False
  2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
                   ` (8 preceding siblings ...)
  2022-03-14 13:18 ` [PATCH v2 09/10] child_job_drained_poll: override polling condition only when in home thread Emanuele Giuseppe Esposito
@ 2022-03-14 13:18 ` Emanuele Giuseppe Esposito
  9 siblings, 0 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-14 13:18 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy,
	Emanuele Giuseppe Esposito, qemu-devel, Hanna Reitz,
	Stefan Hajnoczi, Paolo Bonzini, John Snow

First, use run_job() instead of the current logic to run the
stream job. Then, use auto_finalize=False to be sure that
the job is not automatically deleted once it is done.

In this way, if the job finishes before we want, it is not
finalized yet so the other commands can still execute
without failing. run_job() will then take care of calling
job-finalize.

Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
Suggested-by: Kevin Wolf <kwolf@redhat.com>
---
 tests/qemu-iotests/030 | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index 567bf1da67..951c28878c 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -248,7 +248,7 @@ class TestParallelOps(iotests.QMPTestCase):
             pending_jobs.append(job_id)
             result = self.vm.qmp('block-stream', device=node_name,
                                  job_id=job_id, bottom=f'node{i-1}',
-                                 speed=1024)
+                                 speed=1024, auto_finalize=False)
             self.assert_qmp(result, 'return', {})
 
         # Do this in reverse: After unthrottling them, some jobs may finish
@@ -264,14 +264,8 @@ class TestParallelOps(iotests.QMPTestCase):
             result = self.vm.qmp('block-job-set-speed', device=job, speed=0)
             self.assert_qmp(result, 'return', {})
 
-        # Wait for all jobs to be finished.
-        while len(pending_jobs) > 0:
-            for event in self.vm.get_qmp_events(wait=True):
-                if event['event'] == 'BLOCK_JOB_COMPLETED':
-                    job_id = self.dictpath(event, 'data/device')
-                    self.assertTrue(job_id in pending_jobs)
-                    self.assert_qmp_absent(event, 'data/error')
-                    pending_jobs.remove(job_id)
+        for job in pending_jobs:
+            self.vm.run_job(job=job, auto_finalize=False)
 
         self.assert_no_active_block_jobs()
         self.vm.shutdown()
-- 
2.31.1



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 04/10] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach()
  2022-03-14 13:18 ` [PATCH v2 04/10] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
@ 2022-03-16  9:13   ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-16  9:13 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, John Snow

Unfortunately this patch is not safe: theoretically ->detach can call
bdrv_unapply_subtree_drain, and if it polls, will can call a bh that
for example reads the graph, finding it in an inconsistent state, since
it is between the two writes QLIST_REMOVE(child, next_parent); and
QLIST_REMOVE(child, next);

Please ignore it.
This patch could eventually go in the subtree_drain serie, if we decide
to go in that direction.

Emanuele

Am 14/03/2022 um 14:18 schrieb Emanuele Giuseppe Esposito:
> Doing the opposite can make ->detach() (more precisely
> bdrv_unapply_subtree_drain() in bdrv_child_cb_detach) undo the subtree_drain
> just performed to protect the removal of the child from the graph,
> thus making the fully-enabled assert_bdrv_graph_writable fail.
> 
> Note that assert_bdrv_graph_writable is not yet fully enabled.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  block.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 372f16f4a0..d870ba5393 100644
> --- a/block.c
> +++ b/block.c
> @@ -1448,6 +1448,11 @@ static void bdrv_child_cb_attach(BdrvChild *child)
>      bdrv_apply_subtree_drain(child, bs);
>  }
>  
> +/*
> + * Unapply the drain in the whole child subtree, as
> + * it has been already detached, and then remove from
> + * child->opaque->children.
> + */
>  static void bdrv_child_cb_detach(BdrvChild *child)
>  {
>      BlockDriverState *bs = child->opaque;
> @@ -2864,14 +2869,18 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
>      }
>  
>      if (old_bs) {
> -        /* Detach first so that the recursive drain sections coming from @child
> -         * are already gone and we only end the drain sections that came from
> -         * elsewhere. */
> +        /* First remove child from child->bs->parents list */
> +        assert_bdrv_graph_writable(old_bs);
> +        QLIST_REMOVE(child, next_parent);
> +        /*
> +         * Then call ->detach() cb.
> +         * In child_of_bds case, update the child parent
> +         * (child->opaque) ->children list and
> +         * remove any drain left in the child subtree.
> +         */
>          if (child->klass->detach) {
>              child->klass->detach(child);
>          }
> -        assert_bdrv_graph_writable(old_bs);
> -        QLIST_REMOVE(child, next_parent);
>      }
>  
>      child->bs = new_bs;
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v2 05/10] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child
  2022-03-14 13:18 ` [PATCH v2 05/10] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
@ 2022-03-16  9:16   ` Emanuele Giuseppe Esposito
  0 siblings, 0 replies; 13+ messages in thread
From: Emanuele Giuseppe Esposito @ 2022-03-16  9:16 UTC (permalink / raw)
  To: qemu-block
  Cc: Kevin Wolf, Fam Zheng, Vladimir Sementsov-Ogievskiy, qemu-devel,
	Hanna Reitz, Stefan Hajnoczi, Paolo Bonzini, John Snow

Unfortunately this patch is not safe: theoretically ->attach can call
bdrv_apply_subtree_drain, and if it polls, will can call a bh that
for example reads the graph, finding it in an inconsistent state, since
it is between the two writes QLIST_INSERT_HEAD(&bs->children, child,
next); and QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);

Please ignore it.
This patch could eventually go in the subtree_drain serie, if we decide
to go in that direction.

Emanuele


Am 14/03/2022 um 14:18 schrieb Emanuele Giuseppe Esposito:
> Doing the opposite can make adding the child node to a non-drained node,
> as apply_subtree_drain is only done in ->attach() and thus make
> assert_bdrv_graph_writable fail.
> 
> This can happen for example during a transaction rollback (test 245,
> test_io_with_graph_changes):
> 1. a node is removed from the graph, thus it is undrained
> 2. then something happens, and we need to roll back the transactions
>    through tran_abort()
> 3. at this point, the current code would first attach the undrained node
>    to the graph via QLIST_INSERT_HEAD, and then call ->attach() that
>    will take care of restoring the drain with apply_subtree_drain(),
>    leaving the node undrained between the two operations.
> 
> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
> ---
>  block.c | 20 +++++++++++++++-----
>  1 file changed, 15 insertions(+), 5 deletions(-)
> 
> diff --git a/block.c b/block.c
> index d870ba5393..c6a550f9c6 100644
> --- a/block.c
> +++ b/block.c
> @@ -1434,6 +1434,11 @@ static void bdrv_inherited_options(BdrvChildRole role, bool parent_is_format,
>      *child_flags = flags;
>  }
>  
> +/*
> + * Add the child node to child->opaque->children list,
> + * and then apply the drain to the whole child subtree,
> + * so that the drain count matches with the parent.
> + */
>  static void bdrv_child_cb_attach(BdrvChild *child)
>  {
>      BlockDriverState *bs = child->opaque;
> @@ -2889,8 +2894,6 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
>      }
>  
>      if (new_bs) {
> -        assert_bdrv_graph_writable(new_bs);
> -        QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
>  
>          /*
>           * Detaching the old node may have led to the new node's
> @@ -2901,12 +2904,19 @@ static void bdrv_replace_child_noperm(BdrvChild **childp,
>          assert(new_bs->quiesce_counter <= new_bs_quiesce_counter);
>          drain_saldo += new_bs->quiesce_counter - new_bs_quiesce_counter;
>  
> -        /* Attach only after starting new drained sections, so that recursive
> -         * drain sections coming from @child don't get an extra .drained_begin
> -         * callback. */
> +        /*
> +         * First call ->attach() cb.
> +         * In child_of_bds case, add child to the parent
> +         * (child->opaque) ->children list and if
> +         * necessary add missing drains in the child subtree.
> +         */
>          if (child->klass->attach) {
>              child->klass->attach(child);
>          }
> +
> +        /* Then add child to new_bs->parents list */
> +        assert_bdrv_graph_writable(new_bs);
> +        QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
>      }
>  
>      /*
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-03-16  9:18 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-14 13:18 [PATCH v2 00/10] block: bug fixes in preparation of AioContext removal Emanuele Giuseppe Esposito
2022-03-14 13:18 ` [PATCH v2 01/10] drains: create bh only when polling Emanuele Giuseppe Esposito
2022-03-14 13:18 ` [PATCH v2 02/10] bdrv_parent_drained_begin_single: handle calls from coroutine context Emanuele Giuseppe Esposito
2022-03-14 13:18 ` [PATCH v2 03/10] block/io.c: fix bdrv_child_cb_drained_begin invocations from a coroutine Emanuele Giuseppe Esposito
2022-03-14 13:18 ` [PATCH v2 04/10] block.c: bdrv_replace_child_noperm: first remove the child, and then call ->detach() Emanuele Giuseppe Esposito
2022-03-16  9:13   ` Emanuele Giuseppe Esposito
2022-03-14 13:18 ` [PATCH v2 05/10] block.c: bdrv_replace_child_noperm: first call ->attach(), and then add child Emanuele Giuseppe Esposito
2022-03-16  9:16   ` Emanuele Giuseppe Esposito
2022-03-14 13:18 ` [PATCH v2 06/10] test-bdrv-drain.c: adapt test to support additional subtree drains Emanuele Giuseppe Esposito
2022-03-14 13:18 ` [PATCH v2 07/10] test-bdrv-drain.c: remove test_detach_by_parent_cb() Emanuele Giuseppe Esposito
2022-03-14 13:18 ` [PATCH v2 08/10] tests/unit/test-bdrv-drain.c: graph setup functions can't run in coroutines Emanuele Giuseppe Esposito
2022-03-14 13:18 ` [PATCH v2 09/10] child_job_drained_poll: override polling condition only when in home thread Emanuele Giuseppe Esposito
2022-03-14 13:18 ` [PATCH v2 10/10] tests/qemu-iotests/030: test_stream_parallel should use auto_finalize=False Emanuele Giuseppe Esposito

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).