qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, peter.maydell@linaro.org, qemu-devel@nongnu.org
Subject: [Qemu-devel] [PULL 02/13] block: Introduce BdrvChild.parent_quiesce_counter
Date: Fri, 19 Jul 2019 15:43:34 +0200	[thread overview]
Message-ID: <20190719134345.23526-3-kwolf@redhat.com> (raw)
In-Reply-To: <20190719134345.23526-1-kwolf@redhat.com>

From: Max Reitz <mreitz@redhat.com>

Commit 5cb2737e925042e6c7cd3fb0b01313950b03cddf laid out why
bdrv_do_drained_end() must decrement the quiesce_counter after
bdrv_drain_invoke().  It did not give a very good reason why it has to
happen after bdrv_parent_drained_end(), instead only claiming symmetry
to bdrv_do_drained_begin().

It turns out that delaying it for so long is wrong.

Situation: We have an active commit job (i.e. a mirror job) from top to
base for the following graph:

                  filter
                    |
                  [file]
                    |
                    v
top --[backing]--> base

Now the VM is closed, which results in the job being cancelled and a
bdrv_drain_all() happening pretty much simultaneously.

Beginning the drain means the job is paused once whenever one of its
nodes is quiesced.  This is reversed when the drain ends.

With how the code currently is, after base's drain ends (which means
that it will have unpaused the job once), its quiesce_counter remains at
1 while it goes to undrain its parents (bdrv_parent_drained_end()).  For
some reason or another, undraining filter causes the job to be kicked
and enter mirror_exit_common(), where it proceeds to invoke
block_job_remove_all_bdrv().

Now base will be detached from the job.  Because its quiesce_counter is
still 1, it will unpause the job once more.  So in total, undraining
base will unpause the job twice.  Eventually, this will lead to the
job's pause_count going negative -- well, it would, were there not an
assertion against this, which crashes qemu.

The general problem is that if in bdrv_parent_drained_end() we undrain
parent A, and then undrain parent B, which then leads to A detaching the
child, bdrv_replace_child_noperm() will undrain A as if we had not done
so yet; that is, one time too many.

It follows that we cannot decrement the quiesce_counter after invoking
bdrv_parent_drained_end().

Unfortunately, decrementing it before bdrv_parent_drained_end() would be
wrong, too.  Imagine the above situation in reverse: Undraining A leads
to B detaching the child.  If we had already decremented the
quiesce_counter by that point, bdrv_replace_child_noperm() would undrain
B one time too little; because it expects bdrv_parent_drained_end() to
issue this undrain.  But bdrv_parent_drained_end() won't do that,
because B is no longer a parent.

Therefore, we have to do something else.  This patch opts for
introducing a second quiesce_counter that counts how many times a
child's parent has been quiesced (though c->role->drained_*).  With
that, bdrv_replace_child_noperm() just has to undrain the parent exactly
that many times when removing a child, and it will always be right.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 include/block/block.h     |  7 +++++++
 include/block/block_int.h |  9 +++++++++
 block.c                   | 15 +++++----------
 block/io.c                | 14 +++++++++++---
 4 files changed, 32 insertions(+), 13 deletions(-)

diff --git a/include/block/block.h b/include/block/block.h
index 734c9d2f76..bff3317696 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -617,6 +617,13 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
  */
 void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll);
 
+/**
+ * bdrv_parent_drained_end_single:
+ *
+ * End a quiesced section for the parent of @c.
+ */
+void bdrv_parent_drained_end_single(BdrvChild *c);
+
 /**
  * bdrv_parent_drained_end:
  *
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 50902531b7..f5b044fcdb 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -729,6 +729,15 @@ struct BdrvChild {
      */
     bool frozen;
 
+    /*
+     * How many times the parent of this child has been drained
+     * (through role->drained_*).
+     * Usually, this is equal to bs->quiesce_counter (potentially
+     * reduced by bdrv_drain_all_count).  It may differ while the
+     * child is entering or leaving a drained section.
+     */
+    int parent_quiesce_counter;
+
     QLIST_ENTRY(BdrvChild) next;
     QLIST_ENTRY(BdrvChild) next_parent;
 };
diff --git a/block.c b/block.c
index 29e931e217..8440712ca0 100644
--- a/block.c
+++ b/block.c
@@ -2251,24 +2251,19 @@ static void bdrv_replace_child_noperm(BdrvChild *child,
         if (child->role->detach) {
             child->role->detach(child);
         }
-        if (old_bs->quiesce_counter && child->role->drained_end) {
-            int num = old_bs->quiesce_counter;
-            if (child->role->parent_is_bds) {
-                num -= bdrv_drain_all_count;
-            }
-            assert(num >= 0);
-            for (i = 0; i < num; i++) {
-                child->role->drained_end(child);
-            }
+        while (child->parent_quiesce_counter) {
+            bdrv_parent_drained_end_single(child);
         }
         QLIST_REMOVE(child, next_parent);
+    } else {
+        assert(child->parent_quiesce_counter == 0);
     }
 
     child->bs = new_bs;
 
     if (new_bs) {
         QLIST_INSERT_HEAD(&new_bs->parents, child, next_parent);
-        if (new_bs->quiesce_counter && child->role->drained_begin) {
+        if (new_bs->quiesce_counter) {
             int num = new_bs->quiesce_counter;
             if (child->role->parent_is_bds) {
                 num -= bdrv_drain_all_count;
diff --git a/block/io.c b/block/io.c
index 24a18759fd..1e618f9a37 100644
--- a/block/io.c
+++ b/block/io.c
@@ -55,6 +55,15 @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore,
     }
 }
 
+void bdrv_parent_drained_end_single(BdrvChild *c)
+{
+    assert(c->parent_quiesce_counter > 0);
+    c->parent_quiesce_counter--;
+    if (c->role->drained_end) {
+        c->role->drained_end(c);
+    }
+}
+
 void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
                              bool ignore_bds_parents)
 {
@@ -64,9 +73,7 @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore,
         if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) {
             continue;
         }
-        if (c->role->drained_end) {
-            c->role->drained_end(c);
-        }
+        bdrv_parent_drained_end_single(c);
     }
 }
 
@@ -96,6 +103,7 @@ static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore,
 
 void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll)
 {
+    c->parent_quiesce_counter++;
     if (c->role->drained_begin) {
         c->role->drained_begin(c);
     }
-- 
2.20.1



  parent reply	other threads:[~2019-07-19 13:44 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-19 13:43 [Qemu-devel] [PULL 00/13] Block layer patches Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 01/13] iotests: Set read-zeroes on in null block driver for Valgrind Kevin Wolf
2019-07-24  7:18   ` Christian Borntraeger
2019-07-24  7:30     ` Andrey Shinkevich
2019-07-24  7:33       ` Christian Borntraeger
2019-07-24  7:37         ` Andrey Shinkevich
2019-07-24  7:38       ` Kevin Wolf
2019-07-24  7:57         ` Andrey Shinkevich
2019-07-24  8:05           ` Kevin Wolf
2019-07-24  8:23             ` Andrey Shinkevich
2019-07-19 13:43 ` Kevin Wolf [this message]
2019-07-19 13:43 ` [Qemu-devel] [PULL 03/13] tests: Add job commit by drained_end test Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 04/13] block: Add @drained_end_counter Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 05/13] block: Make bdrv_parent_drained_[^_]*() static Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 06/13] tests: Lock AioContexts in test-block-iothread Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 07/13] block: Do not poll in bdrv_do_drained_end() Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 08/13] tests: Extend commit by drained_end test Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 09/13] block: Loop unsafely in bdrv*drained_end() Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 10/13] iotests: Add @has_quit to vm.shutdown() Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 11/13] iotests: Test commit with a filter on the chain Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 12/13] vl: Drain before (block) job cancel when quitting Kevin Wolf
2019-07-19 13:43 ` [Qemu-devel] [PULL 13/13] iotests: Test quitting with job on throttled node Kevin Wolf
2019-07-22  9:11 ` [Qemu-devel] [PULL 00/13] Block layer patches Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190719134345.23526-3-kwolf@redhat.com \
    --to=kwolf@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).