qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: qemu-devel@nongnu.org
Cc: kwolf@redhat.com
Subject: [Qemu-devel] [RFC PATCH 11/16] qcow2: Add error handling to the l2meta coroutine
Date: Tue, 18 Sep 2012 13:40:37 +0200	[thread overview]
Message-ID: <1347968442-8860-12-git-send-email-kwolf@redhat.com> (raw)
In-Reply-To: <1347968442-8860-1-git-send-email-kwolf@redhat.com>

Not exactly bisectable, but one large patch isn't much better either :-(

m->error is used to allow bdrv_drain() to stop with l2meta in error
state rather than go into an endless loop.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2.c |   44 ++++++++++++++++++++++++++++++++++++++++----
 block/qcow2.h |    3 +++
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 2e220c7..e001436 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -771,11 +771,33 @@ static void coroutine_fn process_l2meta(void *opaque)
         m->sleeping = false;
     }
 
+again:
     qemu_co_mutex_lock(&s->lock);
 
     ret = qcow2_alloc_cluster_link_l2(bs, m);
     if (ret < 0) {
-        /* FIXME */
+        /*
+         * This is a nasty situation: We have already completed the allocation
+         * write request and returned success, so just failing it isn't
+         * possible. We need to make sure to return an error during the next
+         * flush.
+         *
+         * However, we still can't drop the l2meta because we want I/O errors
+         * to be recoverable e.g. after the block device has been grown or the
+         * network connection restored. Sleep until the next flush comes and
+         * then retry.
+         */
+        s->flush_error = ret;
+
+        qemu_co_mutex_unlock(&s->lock);
+        qemu_co_rwlock_unlock(&s->l2meta_flush);
+        m->sleeping = true;
+        m->error = true;
+        qemu_coroutine_yield();
+        m->error = false;
+        m->sleeping = false;
+        qemu_co_rwlock_rdlock(&s->l2meta_flush);
+        goto again;
     }
 
     run_dependent_requests(s, m);
@@ -812,14 +834,27 @@ static bool qcow2_drain(BlockDriverState *bs)
 {
     BDRVQcowState *s = bs->opaque;
     QCowL2Meta *m;
+    bool busy = false;
 
     QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
-        if (m->sleeping) {
+        if (m->sleeping && !m->error) {
             qemu_coroutine_enter(m->co, NULL);
         }
     }
 
-    return !QLIST_EMPTY(&s->cluster_allocs);
+    /*
+     * If there's still a sleeping l2meta, then an error must have occured.
+     * Don't consider l2metas in this state as busy, they only get active on
+     * flushes.
+     */
+    QLIST_FOREACH(m, &s->cluster_allocs, next_in_flight) {
+        if (!m->sleeping) {
+            busy = true;
+            break;
+        }
+    }
+
+    return busy;
 }
 
 static coroutine_fn int qcow2_co_writev(BlockDriverState *bs,
@@ -1648,7 +1683,8 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
         }
     }
 
-    ret = 0;
+    ret = s->flush_error;
+    s->flush_error = 0;
 fail:
     qemu_co_mutex_unlock(&s->lock);
     resume_l2meta(s);
diff --git a/block/qcow2.h b/block/qcow2.h
index 8bf145c..1c4dc0e 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -171,6 +171,8 @@ typedef struct BDRVQcowState {
     CoRwlock l2meta_flush;
     bool in_l2meta_flush;
 
+    int flush_error;
+
     uint32_t crypt_method; /* current crypt method, 0 if no key yet */
     uint32_t crypt_method_header;
     AES_KEY aes_encrypt_key;
@@ -250,6 +252,7 @@ typedef struct QCowL2Meta
      * be reentered in order to cancel the timer.
      */
     bool sleeping;
+    bool error;
 
     /** Coroutine that handles delayed COW and updates L2 entry */
     Coroutine *co;
-- 
1.7.6.5

  parent reply	other threads:[~2012-09-18 11:41 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-09-18 11:40 [Qemu-devel] [RFC PATCH 00/16] qcow2: Delayed COW Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 01/16] qcow2: Round QCowL2Meta.offset down to cluster boundary Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 02/16] qcow2: Introduce Qcow2COWRegion Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 03/16] qcow2: Allocate l2meta dynamically Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 04/16] qcow2: Drop l2meta.cluster_offset Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 05/16] qcow2: Allocate l2meta only for cluster allocations Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 06/16] qcow2: Enable dirty flag in qcow2_alloc_cluster_link_l2 Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 07/16] qcow2: Factor out handle_dependencies() Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 08/16] qcow2: Reading from areas not in L2 tables yet Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 09/16] qcow2: Move COW and L2 update into own coroutine Kevin Wolf
2012-09-18 14:24   ` Paolo Bonzini
2012-09-18 14:44     ` Kevin Wolf
2012-09-18 14:59       ` Paolo Bonzini
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 10/16] qcow2: Delay the COW Kevin Wolf
2012-09-18 14:27   ` Paolo Bonzini
2012-09-18 14:49     ` Kevin Wolf
2012-09-19 18:47   ` Blue Swirl
2012-09-20  6:58     ` Kevin Wolf
2012-09-18 11:40 ` Kevin Wolf [this message]
2012-09-18 14:29   ` [Qemu-devel] [RFC PATCH 11/16] qcow2: Add error handling to the l2meta coroutine Paolo Bonzini
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 12/16] qcow2: Handle dependencies earlier Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 13/16] qcow2: Change handle_dependency to byte granularity Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 14/16] qcow2: Execute run_dependent_requests() without lock Kevin Wolf
2012-09-18 14:33   ` Paolo Bonzini
2012-09-18 14:54     ` Kevin Wolf
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 15/16] qcow2: Cancel COW when overwritten Kevin Wolf
2012-09-18 14:44   ` Paolo Bonzini
2012-09-18 15:02     ` Kevin Wolf
2012-09-18 15:05       ` Paolo Bonzini
2012-09-18 15:08         ` Paolo Bonzini
2012-09-18 11:40 ` [Qemu-devel] [RFC PATCH 16/16] [BROKEN] qcow2: Overwrite COW and allocate new cluster at the same time Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1347968442-8860-12-git-send-email-kwolf@redhat.com \
    --to=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).