qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/2] block: allow commit active as top
@ 2013-07-30  7:17 Fam Zheng
  2013-07-30  7:17 ` [Qemu-devel] [PATCH v2 1/2] block: allow live commit of active image Fam Zheng
  2013-07-30  7:17 ` [Qemu-devel] [PATCH v2 2/2] qemu-iotests: update test cases for commit active Fam Zheng
  0 siblings, 2 replies; 6+ messages in thread
From: Fam Zheng @ 2013-07-30  7:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, jcody, stefanha, pbonzini

Previously live commit of active block device is not supported, this series
implements it and updates corresponding qemu-iotests cases.

Please see commit messages for implementation details.

v2: report ready when all blocks commited for the first time, leave it to the
    user to complete the job (upon user's complete command, also commit writes
    since ready and flushes before reporting completion)

Fam Zheng (2):
  block: allow live commit of active image
  qemu-iotests: update test cases for commit active

 block.c                | 102 ++++++++------------------
 block/commit.c         | 190 +++++++++++++++++++++++++++++--------------------
 include/block/block.h  |   5 +-
 tests/qemu-iotests/040 |  73 ++++++++-----------
 4 files changed, 177 insertions(+), 193 deletions(-)

-- 
1.8.3.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v2 1/2] block: allow live commit of active image
  2013-07-30  7:17 [Qemu-devel] [PATCH v2 0/2] block: allow commit active as top Fam Zheng
@ 2013-07-30  7:17 ` Fam Zheng
  2013-07-30 12:53   ` Stefan Hajnoczi
  2013-07-30  7:17 ` [Qemu-devel] [PATCH v2 2/2] qemu-iotests: update test cases for commit active Fam Zheng
  1 sibling, 1 reply; 6+ messages in thread
From: Fam Zheng @ 2013-07-30  7:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, jcody, stefanha, pbonzini

This patch eliminates limitation of committing the active device.

bdrv_drop_intermediate is reimplemented to take pointers to
(BlockDriverState *), so it can modify the caller's local pointers to
preserve their semantics, while updating active BDS in-place by
bdrv_swap active and base: we need data in 'base' as it's the only
remaining after commit, but we can't delete 'active' as it's referenced
everywhere in the program.

Guest writes to active device during the commit are tracked by dirty map
and committed like block-mirror.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 block.c               | 102 ++++++++-------------------
 block/commit.c        | 190 ++++++++++++++++++++++++++++++--------------------
 include/block/block.h |   5 +-
 3 files changed, 146 insertions(+), 151 deletions(-)

diff --git a/block.c b/block.c
index c77cfd1..c344fd7 100644
--- a/block.c
+++ b/block.c
@@ -2025,18 +2025,11 @@ BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
     return overlay;
 }
 
-typedef struct BlkIntermediateStates {
-    BlockDriverState *bs;
-    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
-} BlkIntermediateStates;
-
-
 /*
- * Drops images above 'base' up to and including 'top', and sets the image
- * above 'top' to have base as its backing file.
- *
- * Requires that the overlay to 'top' is opened r/w, so that the backing file
- * information in 'bs' can be properly updated.
+ * Drops images above '*base' up to and including '*top', and sets new '*base'
+ * as backing_hd of top_overlay (the image orignally has 'top' as backing
+ * file). top_overlay may be NULL if '*top' is active, no such update needed.
+ * Requires that the top_overlay to 'top' is opened r/w.
  *
  * E.g., this will convert the following chain:
  * bottom <- base <- intermediate <- top <- active
@@ -2053,82 +2046,47 @@ typedef struct BlkIntermediateStates {
  *
  * base <- active
  *
- * Error conditions:
- *  if active == top, that is considered an error
+ * It also allows active==top, in which case it converts:
+ *
+ * base <- intermediate <- active (also top)
+ *
+ * to
+ *
+ * base == active == top, i.e. only base remains: *top == *base when return.
  *
  */
-int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
-                           BlockDriverState *base)
+int bdrv_drop_intermediate(BlockDriverState *top_overlay,
+                           BlockDriverState **top,
+                           BlockDriverState **base)
 {
-    BlockDriverState *intermediate;
+    BlockDriverState *pbs;
     BlockDriverState *base_bs = NULL;
-    BlockDriverState *new_top_bs = NULL;
-    BlkIntermediateStates *intermediate_state, *next;
     int ret = -EIO;
 
-    QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete;
-    QSIMPLEQ_INIT(&states_to_delete);
-
-    if (!top->drv || !base->drv) {
+    if (!(*top)->drv || !(*base)->drv) {
         goto exit;
     }
 
-    new_top_bs = bdrv_find_overlay(active, top);
-
-    if (new_top_bs == NULL) {
-        /* we could not find the image above 'top', this is an error */
-        goto exit;
+    for (pbs = (*top)->backing_hd; pbs != *base; pbs = base_bs) {
+        assert(pbs);
+        base_bs = pbs->backing_hd;
+        pbs->backing_hd = NULL;
+        bdrv_delete(pbs);
     }
 
-    /* special case of new_top_bs->backing_hd already pointing to base - nothing
-     * to do, no intermediate images */
-    if (new_top_bs->backing_hd == base) {
-        ret = 0;
-        goto exit;
-    }
+    bdrv_swap(*base, *top);
 
-    intermediate = top;
+    (*base)->backing_hd = NULL;
+    bdrv_delete(*base);
+    *base = *top;
 
-    /* now we will go down through the list, and add each BDS we find
-     * into our deletion queue, until we hit the 'base'
-     */
-    while (intermediate) {
-        intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
-        intermediate_state->bs = intermediate;
-        QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
-
-        if (intermediate->backing_hd == base) {
-            base_bs = intermediate->backing_hd;
-            break;
-        }
-        intermediate = intermediate->backing_hd;
-    }
-    if (base_bs == NULL) {
-        /* something went wrong, we did not end at the base. safely
-         * unravel everything, and exit with error */
-        goto exit;
-    }
-
-    /* success - we can delete the intermediate states, and link top->base */
-    ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
-                                   base_bs->drv ? base_bs->drv->format_name : "");
-    if (ret) {
-        goto exit;
-    }
-    new_top_bs->backing_hd = base_bs;
-
-
-    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
-        /* so that bdrv_close() does not recursively close the chain */
-        intermediate_state->bs->backing_hd = NULL;
-        bdrv_delete(intermediate_state->bs);
+    /* overlay exists when active != top, need to change backing file for it */
+    if (top_overlay) {
+        ret = bdrv_change_backing_file(top_overlay, (*base)->filename,
+                                       (*base)->drv ?
+                                            (*base)->drv->format_name : "");
     }
-    ret = 0;
-
 exit:
-    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
-        g_free(intermediate_state);
-    }
     return ret;
 }
 
diff --git a/block/commit.c b/block/commit.c
index 2227fc2..5bbd955 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -17,14 +17,13 @@
 #include "block/blockjob.h"
 #include "qemu/ratelimit.h"
 
-enum {
-    /*
-     * Size of data buffer for populating the image file.  This should be large
-     * enough to process multiple clusters in a single call, so that populating
-     * contiguous regions of the image is efficient.
-     */
-    COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
-};
+/*
+ * Size of data buffer for populating the image file.  This should be large
+ * enough to process multiple clusters in a single call, so that populating
+ * contiguous regions of the image is efficient.
+ */
+#define COMMIT_BUFFER_SECTORS 128
+#define COMMIT_BUFFER_BYTES (COMMIT_BUFFER_SECTORS * BDRV_SECTOR_SIZE)
 
 #define SLICE_TIME 100000000ULL /* ns */
 
@@ -34,11 +33,27 @@ typedef struct CommitBlockJob {
     BlockDriverState *active;
     BlockDriverState *top;
     BlockDriverState *base;
+    BlockDriverState *overlay;
     BlockdevOnError on_error;
     int base_flags;
     int orig_overlay_flags;
+    bool should_complete;
+    bool ready;
 } CommitBlockJob;
 
+static void commit_complete(BlockJob *job, Error **errp)
+{
+    CommitBlockJob *s = container_of(job, CommitBlockJob, common);
+
+    if (!s->ready) {
+        error_set(errp, QERR_BLOCK_JOB_NOT_READY, job->bs->device_name);
+        return;
+    }
+
+    s->should_complete = true;
+    block_job_resume(job);
+}
+
 static int coroutine_fn commit_populate(BlockDriverState *bs,
                                         BlockDriverState *base,
                                         int64_t sector_num, int nb_sectors,
@@ -65,100 +80,125 @@ static void coroutine_fn commit_run(void *opaque)
     BlockDriverState *active = s->active;
     BlockDriverState *top = s->top;
     BlockDriverState *base = s->base;
-    BlockDriverState *overlay_bs;
     int64_t sector_num, end;
     int ret = 0;
     int n = 0;
     void *buf;
-    int bytes_written = 0;
     int64_t base_len;
+    int64_t next_dirty;
+    HBitmapIter hbi;
 
+    buf = qemu_blockalign(top, COMMIT_BUFFER_BYTES);
     ret = s->common.len = bdrv_getlength(top);
 
-
     if (s->common.len < 0) {
-        goto exit_restore_reopen;
+        goto exit;
     }
 
     ret = base_len = bdrv_getlength(base);
     if (base_len < 0) {
-        goto exit_restore_reopen;
+        goto exit;
     }
 
     if (base_len < s->common.len) {
         ret = bdrv_truncate(base, s->common.len);
         if (ret) {
-            goto exit_restore_reopen;
+            goto exit;
         }
     }
 
     end = s->common.len >> BDRV_SECTOR_BITS;
-    buf = qemu_blockalign(top, COMMIT_BUFFER_SIZE);
 
     for (sector_num = 0; sector_num < end; sector_num += n) {
-        uint64_t delay_ns = 0;
-        bool copy;
 
-wait:
-        /* Note that even when no rate limit is applied we need to yield
-         * with no pending I/O here so that bdrv_drain_all() returns.
-         */
-        block_job_sleep_ns(&s->common, rt_clock, delay_ns);
-        if (block_job_is_cancelled(&s->common)) {
-            break;
-        }
         /* Copy if allocated above the base */
         ret = bdrv_co_is_allocated_above(top, base, sector_num,
-                                         COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
+                                         COMMIT_BUFFER_SECTORS,
                                          &n);
-        copy = (ret == 1);
-        trace_commit_one_iteration(s, sector_num, n, ret);
-        if (copy) {
-            if (s->common.speed) {
-                delay_ns = ratelimit_calculate_delay(&s->limit, n);
-                if (delay_ns > 0) {
-                    goto wait;
-                }
+        if (ret) {
+            bdrv_set_dirty(top, sector_num, n);
+        }
+    }
+
+    for (;;) {
+        uint64_t delay_ns = 0;
+        int64_t cnt = bdrv_get_dirty_count(s->top);
+        if (cnt == 0) {
+            if (!s->overlay && !s->ready) {
+                s->ready = true;
+                block_job_ready(&s->common);
             }
-            ret = commit_populate(top, base, sector_num, n, buf);
-            bytes_written += n * BDRV_SECTOR_SIZE;
+            /* We can complete if user called complete job or the job is
+             * committing non-active image */
+            if (s->should_complete || s->overlay) {
+                break;
+            }
+            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
         }
-        if (ret < 0) {
-            if (s->on_error == BLOCKDEV_ON_ERROR_STOP ||
-                s->on_error == BLOCKDEV_ON_ERROR_REPORT||
-                (s->on_error == BLOCKDEV_ON_ERROR_ENOSPC && ret == -ENOSPC)) {
-                goto exit_free_buf;
-            } else {
-                n = 0;
-                continue;
+
+        if (block_job_is_cancelled(&s->common)) {
+            goto exit;
+        }
+
+        bdrv_dirty_iter_init(s->top, &hbi);
+        for (next_dirty = hbitmap_iter_next(&hbi);
+                next_dirty >= 0;
+                next_dirty = hbitmap_iter_next(&hbi)) {
+            sector_num = next_dirty;
+            if (block_job_is_cancelled(&s->common)) {
+                goto exit;
+            }
+            delay_ns = ratelimit_calculate_delay(&s->limit,
+                                                 COMMIT_BUFFER_SECTORS);
+            /* Note that even when no rate limit is applied we need to yield
+             * with no pending I/O here so that bdrv_drain_all() returns.
+             */
+            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+            trace_commit_one_iteration(s, sector_num,
+                                       COMMIT_BUFFER_SECTORS, ret);
+            ret = commit_populate(top, base, sector_num,
+                                  COMMIT_BUFFER_SECTORS, buf);
+            if (ret < 0) {
+                if (s->on_error == BLOCKDEV_ON_ERROR_STOP ||
+                    s->on_error == BLOCKDEV_ON_ERROR_REPORT ||
+                    (s->on_error == BLOCKDEV_ON_ERROR_ENOSPC &&
+                         ret == -ENOSPC)) {
+                    goto exit;
+                } else {
+                    continue;
+                }
             }
+            /* Publish progress */
+            s->common.offset += COMMIT_BUFFER_BYTES;
+            bdrv_reset_dirty(top, sector_num, COMMIT_BUFFER_SECTORS);
         }
-        /* Publish progress */
-        s->common.offset += n * BDRV_SECTOR_SIZE;
     }
+    s->common.offset = end;
 
-    ret = 0;
-
-    if (!block_job_is_cancelled(&s->common) && sector_num == end) {
-        /* success */
-        ret = bdrv_drop_intermediate(active, top, base);
+    bdrv_flush(base);
+    if (!block_job_is_cancelled(&s->common)) {
+        /* Drop intermediate: [top, base) */
+        ret = bdrv_drop_intermediate(s->overlay, &top, &base);
+        s->common.offset = s->common.len;
     }
 
-exit_free_buf:
-    qemu_vfree(buf);
+    ret = 0;
+
+exit:
+    bdrv_set_dirty_tracking(active, 0);
 
-exit_restore_reopen:
     /* restore base open flags here if appropriate (e.g., change the base back
      * to r/o). These reopens do not need to be atomic, since we won't abort
      * even on failure here */
-    if (s->base_flags != bdrv_get_flags(base)) {
+    if (s->overlay && s->base_flags != bdrv_get_flags(base)) {
         bdrv_reopen(base, s->base_flags, NULL);
     }
-    overlay_bs = bdrv_find_overlay(active, top);
-    if (overlay_bs && s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
-        bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
+
+    if (s->overlay && s->orig_overlay_flags != bdrv_get_flags(s->overlay)) {
+        bdrv_reopen(s->overlay, s->orig_overlay_flags, NULL);
     }
 
+    qemu_vfree(buf);
     block_job_completed(&s->common, ret);
 }
 
@@ -177,6 +217,7 @@ static const BlockJobType commit_job_type = {
     .instance_size = sizeof(CommitBlockJob),
     .job_type      = "commit",
     .set_speed     = commit_set_speed,
+    .complete      = commit_complete,
 };
 
 void commit_start(BlockDriverState *bs, BlockDriverState *base,
@@ -198,13 +239,6 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
         return;
     }
 
-    /* Once we support top == active layer, remove this check */
-    if (top == bs) {
-        error_setg(errp,
-                   "Top image as the active layer is currently unsupported");
-        return;
-    }
-
     if (top == base) {
         error_setg(errp, "Invalid files for merge: top and base are the same");
         return;
@@ -212,23 +246,20 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
 
     overlay_bs = bdrv_find_overlay(bs, top);
 
-    if (overlay_bs == NULL) {
-        error_setg(errp, "Could not find overlay image for %s:", top->filename);
-        return;
-    }
-
     orig_base_flags    = bdrv_get_flags(base);
-    orig_overlay_flags = bdrv_get_flags(overlay_bs);
+    if (overlay_bs) {
+        orig_overlay_flags = bdrv_get_flags(overlay_bs);
+        if (!(orig_overlay_flags & BDRV_O_RDWR)) {
+            reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs,
+                    orig_overlay_flags | BDRV_O_RDWR);
+        }
+    }
 
     /* convert base & overlay_bs to r/w, if necessary */
     if (!(orig_base_flags & BDRV_O_RDWR)) {
         reopen_queue = bdrv_reopen_queue(reopen_queue, base,
                                          orig_base_flags | BDRV_O_RDWR);
     }
-    if (!(orig_overlay_flags & BDRV_O_RDWR)) {
-        reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs,
-                                         orig_overlay_flags | BDRV_O_RDWR);
-    }
     if (reopen_queue) {
         bdrv_reopen_multiple(reopen_queue, &local_err);
         if (local_err != NULL) {
@@ -237,7 +268,6 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
         }
     }
 
-
     s = block_job_create(&commit_job_type, bs, speed, cb, opaque, errp);
     if (!s) {
         return;
@@ -246,13 +276,19 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
     s->base   = base;
     s->top    = top;
     s->active = bs;
+    s->overlay = overlay_bs;
 
     s->base_flags          = orig_base_flags;
-    s->orig_overlay_flags  = orig_overlay_flags;
+    if (overlay_bs) {
+        s->orig_overlay_flags  = orig_overlay_flags;
+    }
 
     s->on_error = on_error;
     s->common.co = qemu_coroutine_create(commit_run);
 
     trace_commit_start(bs, base, top, s, s->common.co, opaque);
+
+    bdrv_set_dirty_tracking(top, COMMIT_BUFFER_BYTES);
+
     qemu_coroutine_enter(s->common.co, s);
 }
diff --git a/include/block/block.h b/include/block/block.h
index 742fce5..a5d05ab 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -199,8 +199,9 @@ int bdrv_commit_all(void);
 int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
-int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
-                           BlockDriverState *base);
+int bdrv_drop_intermediate(BlockDriverState *top_overlay,
+                           BlockDriverState **top,
+                           BlockDriverState **base);
 BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
                                     BlockDriverState *bs);
 BlockDriverState *bdrv_find_base(BlockDriverState *bs);
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH v2 2/2] qemu-iotests: update test cases for commit active
  2013-07-30  7:17 [Qemu-devel] [PATCH v2 0/2] block: allow commit active as top Fam Zheng
  2013-07-30  7:17 ` [Qemu-devel] [PATCH v2 1/2] block: allow live commit of active image Fam Zheng
@ 2013-07-30  7:17 ` Fam Zheng
  1 sibling, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2013-07-30  7:17 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, famz, jcody, stefanha, pbonzini

Factor out commit test common logic into super class, and update test
of committing the active image.

Signed-off-by: Fam Zheng <famz@redhat.com>
---
 tests/qemu-iotests/040 | 73 +++++++++++++++++++++-----------------------------
 1 file changed, 31 insertions(+), 42 deletions(-)

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
index aad535a..902df0d 100755
--- a/tests/qemu-iotests/040
+++ b/tests/qemu-iotests/040
@@ -63,6 +63,28 @@ class ImageCommitTestCase(iotests.QMPTestCase):
             i = i + 512
         file.close()
 
+    def run_commit_test(self, top, base, is_active=False):
+        self.assert_no_active_commit()
+        result = self.vm.qmp('block-commit', device='drive0', top=top, base=base)
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assert_qmp(event, 'data/type', 'commit')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+                elif event['event'] == 'BLOCK_JOB_READY':
+                    self.assert_qmp(event, 'data/type', 'commit')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    self.vm.qmp('block-job-complete', device='drive0')
+
+        self.assert_no_active_commit()
+        self.vm.shutdown()
 
 class TestSingleDrive(ImageCommitTestCase):
     image_len = 1 * 1024 * 1024
@@ -84,23 +106,7 @@ class TestSingleDrive(ImageCommitTestCase):
         os.remove(backing_img)
 
     def test_commit(self):
-        self.assert_no_active_commit()
-        result = self.vm.qmp('block-commit', device='drive0', top='%s' % mid_img)
-        self.assert_qmp(result, 'return', {})
-
-        completed = False
-        while not completed:
-            for event in self.vm.get_qmp_events(wait=True):
-                if event['event'] == 'BLOCK_JOB_COMPLETED':
-                    self.assert_qmp(event, 'data/type', 'commit')
-                    self.assert_qmp(event, 'data/device', 'drive0')
-                    self.assert_qmp(event, 'data/offset', self.image_len)
-                    self.assert_qmp(event, 'data/len', self.image_len)
-                    completed = True
-
-        self.assert_no_active_commit()
-        self.vm.shutdown()
-
+        self.run_commit_test(mid_img, backing_img)
         self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
         self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
 
@@ -127,10 +133,9 @@ class TestSingleDrive(ImageCommitTestCase):
         self.assert_qmp(result, 'error/desc', 'Base \'badfile\' not found')
 
     def test_top_is_active(self):
-        self.assert_no_active_commit()
-        result = self.vm.qmp('block-commit', device='drive0', top='%s' % test_img, base='%s' % backing_img)
-        self.assert_qmp(result, 'error/class', 'GenericError')
-        self.assert_qmp(result, 'error/desc', 'Top image as the active layer is currently unsupported')
+        self.run_commit_test(test_img, backing_img, True)
+        self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
+        self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
 
     def test_top_and_base_reversed(self):
         self.assert_no_active_commit()
@@ -191,23 +196,7 @@ class TestRelativePaths(ImageCommitTestCase):
                 raise
 
     def test_commit(self):
-        self.assert_no_active_commit()
-        result = self.vm.qmp('block-commit', device='drive0', top='%s' % self.mid_img)
-        self.assert_qmp(result, 'return', {})
-
-        completed = False
-        while not completed:
-            for event in self.vm.get_qmp_events(wait=True):
-                if event['event'] == 'BLOCK_JOB_COMPLETED':
-                    self.assert_qmp(event, 'data/type', 'commit')
-                    self.assert_qmp(event, 'data/device', 'drive0')
-                    self.assert_qmp(event, 'data/offset', self.image_len)
-                    self.assert_qmp(event, 'data/len', self.image_len)
-                    completed = True
-
-        self.assert_no_active_commit()
-        self.vm.shutdown()
-
+        self.run_commit_test(self.mid_img, self.backing_img)
         self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', self.backing_img_abs).find("verification failed"))
         self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', self.backing_img_abs).find("verification failed"))
 
@@ -234,10 +223,9 @@ class TestRelativePaths(ImageCommitTestCase):
         self.assert_qmp(result, 'error/desc', 'Base \'badfile\' not found')
 
     def test_top_is_active(self):
-        self.assert_no_active_commit()
-        result = self.vm.qmp('block-commit', device='drive0', top='%s' % self.test_img, base='%s' % self.backing_img)
-        self.assert_qmp(result, 'error/class', 'GenericError')
-        self.assert_qmp(result, 'error/desc', 'Top image as the active layer is currently unsupported')
+        self.run_commit_test(self.test_img, self.backing_img, True)
+        self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', self.backing_img_abs).find("verification failed"))
+        self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', self.backing_img_abs).find("verification failed"))
 
     def test_top_and_base_reversed(self):
         self.assert_no_active_commit()
@@ -253,6 +241,7 @@ class TestSetSpeed(ImageCommitTestCase):
         qemu_img('create', backing_img, str(TestSetSpeed.image_len))
         qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, mid_img)
         qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % mid_img, test_img)
+        qemu_io('-c', 'write -P 0xef 524288 524288', mid_img)
         self.vm = iotests.VM().add_drive(test_img)
         self.vm.launch()
 
-- 
1.8.3.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] block: allow live commit of active image
  2013-07-30  7:17 ` [Qemu-devel] [PATCH v2 1/2] block: allow live commit of active image Fam Zheng
@ 2013-07-30 12:53   ` Stefan Hajnoczi
  2013-07-30 13:17     ` Paolo Bonzini
  2013-08-15  7:46     ` Fam Zheng
  0 siblings, 2 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2013-07-30 12:53 UTC (permalink / raw)
  To: Fam Zheng; +Cc: kwolf, pbonzini, jcody, qemu-devel, stefanha

On Tue, Jul 30, 2013 at 03:17:47PM +0800, Fam Zheng wrote:
>      for (sector_num = 0; sector_num < end; sector_num += n) {
> -        uint64_t delay_ns = 0;
> -        bool copy;
>  
> -wait:
> -        /* Note that even when no rate limit is applied we need to yield
> -         * with no pending I/O here so that bdrv_drain_all() returns.
> -         */
> -        block_job_sleep_ns(&s->common, rt_clock, delay_ns);
> -        if (block_job_is_cancelled(&s->common)) {
> -            break;
> -        }
>          /* Copy if allocated above the base */
>          ret = bdrv_co_is_allocated_above(top, base, sector_num,
> -                                         COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
> +                                         COMMIT_BUFFER_SECTORS,
>                                           &n);
> -        copy = (ret == 1);
> -        trace_commit_one_iteration(s, sector_num, n, ret);
> -        if (copy) {
> -            if (s->common.speed) {
> -                delay_ns = ratelimit_calculate_delay(&s->limit, n);
> -                if (delay_ns > 0) {
> -                    goto wait;
> -                }
> +        if (ret) {
> +            bdrv_set_dirty(top, sector_num, n);
> +        }

This could take a while on a big image.  You need sleep/cancel here like
the other blockjob loops have.

I think error handling isn't sufficient here.  If
bdrv_co_is_allocated_above() fails you need to exit (if n becomes 0 on
error, then this is an infinite loop!).

> +        bdrv_dirty_iter_init(s->top, &hbi);
> +        for (next_dirty = hbitmap_iter_next(&hbi);
> +                next_dirty >= 0;
> +                next_dirty = hbitmap_iter_next(&hbi)) {
> +            sector_num = next_dirty;
> +            if (block_job_is_cancelled(&s->common)) {
> +                goto exit;
> +            }
> +            delay_ns = ratelimit_calculate_delay(&s->limit,
> +                                                 COMMIT_BUFFER_SECTORS);
> +            /* Note that even when no rate limit is applied we need to yield
> +             * with no pending I/O here so that bdrv_drain_all() returns.
> +             */
> +            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
> +            trace_commit_one_iteration(s, sector_num,
> +                                       COMMIT_BUFFER_SECTORS, ret);
> +            ret = commit_populate(top, base, sector_num,
> +                                  COMMIT_BUFFER_SECTORS, buf);

Can we be sure that a guest write during commit_populate()...

> +            if (ret < 0) {
> +                if (s->on_error == BLOCKDEV_ON_ERROR_STOP ||
> +                    s->on_error == BLOCKDEV_ON_ERROR_REPORT ||
> +                    (s->on_error == BLOCKDEV_ON_ERROR_ENOSPC &&
> +                         ret == -ENOSPC)) {
> +                    goto exit;
> +                } else {
> +                    continue;
> +                }
>              }
> +            /* Publish progress */
> +            s->common.offset += COMMIT_BUFFER_BYTES;
> +            bdrv_reset_dirty(top, sector_num, COMMIT_BUFFER_SECTORS);

...sets the dirty but *after* us?  Otherwise there is a race condition
where guest writes fail to be copied into the base image.

I think the answer is "no" since commit_populate() performs two separate
blocking operations: bdrv_read(top) followed by bdrv_write(base).  Now
imagine the guest does bdrv_aio_writev(top) after we complete
bdrv_read(top) but before we do bdrv_reset_dirty().

>          }
> -        /* Publish progress */
> -        s->common.offset += n * BDRV_SECTOR_SIZE;
>      }
> +    s->common.offset = end;
>  
> -    ret = 0;
> -
> -    if (!block_job_is_cancelled(&s->common) && sector_num == end) {
> -        /* success */
> -        ret = bdrv_drop_intermediate(active, top, base);
> +    bdrv_flush(base);

bdrv_co_flush() is clearer since we're in a coroutine.

> +    if (!block_job_is_cancelled(&s->common)) {
> +        /* Drop intermediate: [top, base) */
> +        ret = bdrv_drop_intermediate(s->overlay, &top, &base);
> +        s->common.offset = s->common.len;
>      }
>  
> -exit_free_buf:
> -    qemu_vfree(buf);
> +    ret = 0;
> +
> +exit:
> +    bdrv_set_dirty_tracking(active, 0);
>  
> -exit_restore_reopen:
>      /* restore base open flags here if appropriate (e.g., change the base back
>       * to r/o). These reopens do not need to be atomic, since we won't abort
>       * even on failure here */
> -    if (s->base_flags != bdrv_get_flags(base)) {
> +    if (s->overlay && s->base_flags != bdrv_get_flags(base)) {

Why check s->overlay, this only concerns base?

> @@ -212,23 +246,20 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
>  
>      overlay_bs = bdrv_find_overlay(bs, top);
>  
> -    if (overlay_bs == NULL) {
> -        error_setg(errp, "Could not find overlay image for %s:", top->filename);
> -        return;
> -    }
> -
>      orig_base_flags    = bdrv_get_flags(base);
> -    orig_overlay_flags = bdrv_get_flags(overlay_bs);
> +    if (overlay_bs) {
> +        orig_overlay_flags = bdrv_get_flags(overlay_bs);
> +        if (!(orig_overlay_flags & BDRV_O_RDWR)) {
> +            reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs,
> +                    orig_overlay_flags | BDRV_O_RDWR);
> +        }
> +    }
>  
>      /* convert base & overlay_bs to r/w, if necessary */
>      if (!(orig_base_flags & BDRV_O_RDWR)) {
>          reopen_queue = bdrv_reopen_queue(reopen_queue, base,
>                                           orig_base_flags | BDRV_O_RDWR);
>      }
> -    if (!(orig_overlay_flags & BDRV_O_RDWR)) {
> -        reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs,
> -                                         orig_overlay_flags | BDRV_O_RDWR);
> -    }

IMO it is clearer to put the two orig_base_flags stanzas together rather
than interleaving them:

/* convert base & overlay_bs to r/w, if necessary */
orig_base_flags = bdrv_get_flags(base);
if (!(orig_base_flags & BDRV_O_RDWR)) {
    reopen_queue = bdrv_reopen_queue(reopen_queue, base,
                                     orig_base_flags |
                                     BDRV_O_RDWR);
}

overlay_bs = bdrv_find_overlay(bs, top);
if (overlay_bs) {
    orig_overlay_flags = bdrv_get_flags(overlay_bs);
    if (!(orig_overlay_flags & BDRV_O_RDWR)) {
        reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs,
                orig_overlay_flags | BDRV_O_RDWR); 
    }
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] block: allow live commit of active image
  2013-07-30 12:53   ` Stefan Hajnoczi
@ 2013-07-30 13:17     ` Paolo Bonzini
  2013-08-15  7:46     ` Fam Zheng
  1 sibling, 0 replies; 6+ messages in thread
From: Paolo Bonzini @ 2013-07-30 13:17 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, jcody, Fam Zheng, qemu-devel, stefanha

Il 30/07/2013 14:53, Stefan Hajnoczi ha scritto:
> On Tue, Jul 30, 2013 at 03:17:47PM +0800, Fam Zheng wrote:
>>      for (sector_num = 0; sector_num < end; sector_num += n) {
>> -        uint64_t delay_ns = 0;
>> -        bool copy;
>>  
>> -wait:
>> -        /* Note that even when no rate limit is applied we need to yield
>> -         * with no pending I/O here so that bdrv_drain_all() returns.
>> -         */
>> -        block_job_sleep_ns(&s->common, rt_clock, delay_ns);
>> -        if (block_job_is_cancelled(&s->common)) {
>> -            break;
>> -        }
>>          /* Copy if allocated above the base */
>>          ret = bdrv_co_is_allocated_above(top, base, sector_num,
>> -                                         COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
>> +                                         COMMIT_BUFFER_SECTORS,
>>                                           &n);
>> -        copy = (ret == 1);
>> -        trace_commit_one_iteration(s, sector_num, n, ret);
>> -        if (copy) {
>> -            if (s->common.speed) {
>> -                delay_ns = ratelimit_calculate_delay(&s->limit, n);
>> -                if (delay_ns > 0) {
>> -                    goto wait;
>> -                }
>> +        if (ret) {
>> +            bdrv_set_dirty(top, sector_num, n);
>> +        }
> 
> This could take a while on a big image.  You need sleep/cancel here like
> the other blockjob loops have.
> 
> I think error handling isn't sufficient here.  If
> bdrv_co_is_allocated_above() fails you need to exit (if n becomes 0 on
> error, then this is an infinite loop!).
> 
>> +        bdrv_dirty_iter_init(s->top, &hbi);
>> +        for (next_dirty = hbitmap_iter_next(&hbi);
>> +                next_dirty >= 0;
>> +                next_dirty = hbitmap_iter_next(&hbi)) {
>> +            sector_num = next_dirty;
>> +            if (block_job_is_cancelled(&s->common)) {
>> +                goto exit;
>> +            }
>> +            delay_ns = ratelimit_calculate_delay(&s->limit,
>> +                                                 COMMIT_BUFFER_SECTORS);
>> +            /* Note that even when no rate limit is applied we need to yield
>> +             * with no pending I/O here so that bdrv_drain_all() returns.
>> +             */
>> +            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
>> +            trace_commit_one_iteration(s, sector_num,
>> +                                       COMMIT_BUFFER_SECTORS, ret);
>> +            ret = commit_populate(top, base, sector_num,
>> +                                  COMMIT_BUFFER_SECTORS, buf);
> 
> Can we be sure that a guest write during commit_populate()...
> 
>> +            if (ret < 0) {
>> +                if (s->on_error == BLOCKDEV_ON_ERROR_STOP ||
>> +                    s->on_error == BLOCKDEV_ON_ERROR_REPORT ||
>> +                    (s->on_error == BLOCKDEV_ON_ERROR_ENOSPC &&
>> +                         ret == -ENOSPC)) {
>> +                    goto exit;
>> +                } else {
>> +                    continue;
>> +                }
>>              }
>> +            /* Publish progress */
>> +            s->common.offset += COMMIT_BUFFER_BYTES;
>> +            bdrv_reset_dirty(top, sector_num, COMMIT_BUFFER_SECTORS);
> 
> ...sets the dirty but *after* us?  Otherwise there is a race condition
> where guest writes fail to be copied into the base image.
> 
> I think the answer is "no" since commit_populate() performs two separate
> blocking operations: bdrv_read(top) followed by bdrv_write(base).  Now
> imagine the guest does bdrv_aio_writev(top) after we complete
> bdrv_read(top) but before we do bdrv_reset_dirty().

Indeed, reset must always come _before_ reads (and set comes always
after writes).

Paolo

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/2] block: allow live commit of active image
  2013-07-30 12:53   ` Stefan Hajnoczi
  2013-07-30 13:17     ` Paolo Bonzini
@ 2013-08-15  7:46     ` Fam Zheng
  1 sibling, 0 replies; 6+ messages in thread
From: Fam Zheng @ 2013-08-15  7:46 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kwolf, pbonzini, jcody, qemu-devel, stefanha

On Tue, 07/30 14:53, Stefan Hajnoczi wrote:
> On Tue, Jul 30, 2013 at 03:17:47PM +0800, Fam Zheng wrote:
> >      for (sector_num = 0; sector_num < end; sector_num += n) {
> > -        uint64_t delay_ns = 0;
> > -        bool copy;
> >  
> > -wait:
> > -        /* Note that even when no rate limit is applied we need to yield
> > -         * with no pending I/O here so that bdrv_drain_all() returns.
> > -         */
> > -        block_job_sleep_ns(&s->common, rt_clock, delay_ns);
> > -        if (block_job_is_cancelled(&s->common)) {
> > -            break;
> > -        }
> >          /* Copy if allocated above the base */
> >          ret = bdrv_co_is_allocated_above(top, base, sector_num,
> > -                                         COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
> > +                                         COMMIT_BUFFER_SECTORS,
> >                                           &n);
> > -        copy = (ret == 1);
> > -        trace_commit_one_iteration(s, sector_num, n, ret);
> > -        if (copy) {
> > -            if (s->common.speed) {
> > -                delay_ns = ratelimit_calculate_delay(&s->limit, n);
> > -                if (delay_ns > 0) {
> > -                    goto wait;
> > -                }
> > +        if (ret) {
> > +            bdrv_set_dirty(top, sector_num, n);
> > +        }
> 
> This could take a while on a big image.  You need sleep/cancel here like
> the other blockjob loops have.
> 
> I think error handling isn't sufficient here.  If
> bdrv_co_is_allocated_above() fails you need to exit (if n becomes 0 on
> error, then this is an infinite loop!).
> 
Yes, you are right.

> > +        bdrv_dirty_iter_init(s->top, &hbi);
> > +        for (next_dirty = hbitmap_iter_next(&hbi);
> > +                next_dirty >= 0;
> > +                next_dirty = hbitmap_iter_next(&hbi)) {
> > +            sector_num = next_dirty;
> > +            if (block_job_is_cancelled(&s->common)) {
> > +                goto exit;
> > +            }
> > +            delay_ns = ratelimit_calculate_delay(&s->limit,
> > +                                                 COMMIT_BUFFER_SECTORS);
> > +            /* Note that even when no rate limit is applied we need to yield
> > +             * with no pending I/O here so that bdrv_drain_all() returns.
> > +             */
> > +            block_job_sleep_ns(&s->common, rt_clock, delay_ns);
> > +            trace_commit_one_iteration(s, sector_num,
> > +                                       COMMIT_BUFFER_SECTORS, ret);
> > +            ret = commit_populate(top, base, sector_num,
> > +                                  COMMIT_BUFFER_SECTORS, buf);
> 
> Can we be sure that a guest write during commit_populate()...
> 
> > +            if (ret < 0) {
> > +                if (s->on_error == BLOCKDEV_ON_ERROR_STOP ||
> > +                    s->on_error == BLOCKDEV_ON_ERROR_REPORT ||
> > +                    (s->on_error == BLOCKDEV_ON_ERROR_ENOSPC &&
> > +                         ret == -ENOSPC)) {
> > +                    goto exit;
> > +                } else {
> > +                    continue;
> > +                }
> >              }
> > +            /* Publish progress */
> > +            s->common.offset += COMMIT_BUFFER_BYTES;
> > +            bdrv_reset_dirty(top, sector_num, COMMIT_BUFFER_SECTORS);
> 
> ...sets the dirty but *after* us?  Otherwise there is a race condition
> where guest writes fail to be copied into the base image.
> 
> I think the answer is "no" since commit_populate() performs two separate
> blocking operations: bdrv_read(top) followed by bdrv_write(base).  Now
> imagine the guest does bdrv_aio_writev(top) after we complete
> bdrv_read(top) but before we do bdrv_reset_dirty().
> 
I see, good explaination, thanks and will fix.

> >          }
> > -        /* Publish progress */
> > -        s->common.offset += n * BDRV_SECTOR_SIZE;
> >      }
> > +    s->common.offset = end;
> >  
> > -    ret = 0;
> > -
> > -    if (!block_job_is_cancelled(&s->common) && sector_num == end) {
> > -        /* success */
> > -        ret = bdrv_drop_intermediate(active, top, base);
> > +    bdrv_flush(base);
> 
> bdrv_co_flush() is clearer since we're in a coroutine.
> 
OK

> > +    if (!block_job_is_cancelled(&s->common)) {
> > +        /* Drop intermediate: [top, base) */
> > +        ret = bdrv_drop_intermediate(s->overlay, &top, &base);
> > +        s->common.offset = s->common.len;
> >      }
> >  
> > -exit_free_buf:
> > -    qemu_vfree(buf);
> > +    ret = 0;
> > +
> > +exit:
> > +    bdrv_set_dirty_tracking(active, 0);
> >  
> > -exit_restore_reopen:
> >      /* restore base open flags here if appropriate (e.g., change the base back
> >       * to r/o). These reopens do not need to be atomic, since we won't abort
> >       * even on failure here */
> > -    if (s->base_flags != bdrv_get_flags(base)) {
> > +    if (s->overlay && s->base_flags != bdrv_get_flags(base)) {
> 
> Why check s->overlay, this only concerns base?
> 
With s->overlay, we are committing nonactive, the usual case.

s->overlay == NULL means we are committing active layer, so base will be
dropped, don't reopen it.

Will add a comment here.

> > @@ -212,23 +246,20 @@ void commit_start(BlockDriverState *bs, BlockDriverState *base,
> >  
> >      overlay_bs = bdrv_find_overlay(bs, top);
> >  
> > -    if (overlay_bs == NULL) {
> > -        error_setg(errp, "Could not find overlay image for %s:", top->filename);
> > -        return;
> > -    }
> > -
> >      orig_base_flags    = bdrv_get_flags(base);
> > -    orig_overlay_flags = bdrv_get_flags(overlay_bs);
> > +    if (overlay_bs) {
> > +        orig_overlay_flags = bdrv_get_flags(overlay_bs);
> > +        if (!(orig_overlay_flags & BDRV_O_RDWR)) {
> > +            reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs,
> > +                    orig_overlay_flags | BDRV_O_RDWR);
> > +        }
> > +    }
> >  
> >      /* convert base & overlay_bs to r/w, if necessary */
> >      if (!(orig_base_flags & BDRV_O_RDWR)) {
> >          reopen_queue = bdrv_reopen_queue(reopen_queue, base,
> >                                           orig_base_flags | BDRV_O_RDWR);
> >      }
> > -    if (!(orig_overlay_flags & BDRV_O_RDWR)) {
> > -        reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs,
> > -                                         orig_overlay_flags | BDRV_O_RDWR);
> > -    }
> 
> IMO it is clearer to put the two orig_base_flags stanzas together rather
> than interleaving them:
> 
> /* convert base & overlay_bs to r/w, if necessary */
> orig_base_flags = bdrv_get_flags(base);
> if (!(orig_base_flags & BDRV_O_RDWR)) {
>     reopen_queue = bdrv_reopen_queue(reopen_queue, base,
>                                      orig_base_flags |
>                                      BDRV_O_RDWR);
> }
> 
> overlay_bs = bdrv_find_overlay(bs, top);
> if (overlay_bs) {
>     orig_overlay_flags = bdrv_get_flags(overlay_bs);
>     if (!(orig_overlay_flags & BDRV_O_RDWR)) {
>         reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs,
>                 orig_overlay_flags | BDRV_O_RDWR); 
>     }
> }

OK

Fam

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-08-15  7:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-07-30  7:17 [Qemu-devel] [PATCH v2 0/2] block: allow commit active as top Fam Zheng
2013-07-30  7:17 ` [Qemu-devel] [PATCH v2 1/2] block: allow live commit of active image Fam Zheng
2013-07-30 12:53   ` Stefan Hajnoczi
2013-07-30 13:17     ` Paolo Bonzini
2013-08-15  7:46     ` Fam Zheng
2013-07-30  7:17 ` [Qemu-devel] [PATCH v2 2/2] qemu-iotests: update test cases for commit active Fam Zheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).