qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/8] Live block commit
@ 2012-09-14 13:41 Jeff Cody
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 1/8] block: add support functions for live commit, to find and delete images Jeff Cody
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

This series adds the basic case, of a live commit between two
images below the active layer, e.g.:

[base] <--- [snp-1] <--- [snp-2] <--- [snp-3] <--- [active]

can be collapsed down via commit, into:

[base] <--- [active]

or,

[base] <--- [snp-1] <--- [active],

[base] <--- [snp-3] <--- [active],

etc..

TODO: * 'stage-2' of live commit functionality, to be able to push down the
        active layer. 


Changes from RFC:

* Feedback incorporated (see below)
* qemu-iotests added
* live snapshots have active->backing_hd reopened read-only.

Patch 1/8: [Kevin] -  Rename bdrv_delete_intermediate() to
                      bdrv_drop_intermediate()
                   -  Change terminology from 'child' to 'overlay'
                   -  Do not allow top == active, and return NULL for error,
                      simplified some logic.

Patch 2/8: [Kevin]       - Preserve errors in commit_populate()
           [Kevin, Eric] - Check that there is enough room in base, and resize
                           with bdrv_truncate() if not.
           [Jeff]        - Switch from 'child' to 'overlay' terminology
                         - Fixed up more error checking
                         - Minor cleanup
           [Kevin]       - Perform the bdrv_reopen() inside commit_start(),
                           instead of in blockdev.c

Patch 3/8: [Paolo]  - Rearranged patch, so the forward declaration is gone
            
Patch 4/8: None
Patch 5/8: None

Patch 6/8: [Eric]   - Updated QMP/qmp-events.txt with the new 'type' field
           [Kevin]  - If base cannot be found, return attempted base name
                    - Moved reopen() logic to commit_start
                    - Fixed comments in JSON command header
           [Jeff]   - If 'top' is not specified, default to active->backing_hd

Patch 7/8: New - simple qemu-iotest for commit.

Patch 8/8: New - after live snapshot, use bdrv_reopen() to reopen
                 active->backing_hd read-only.



Changes from the RFC v1 -> RFC v2:

* This patch series is not on top of Paolo's blk mirror series yet, to make it
  easier to apply independently if desired.  This means some of what was in the
  previous RFC series is not in this one (BlockdevOnError, for instance), but
  that can be easily added in once Paolo's series are in.

* This patches series is dependent on the reopen() series with transactional
  reopen.

* The target release for this series is 1.3

* Found some mistakes in the reopen calls

* Dropped the BlockdevOnError argument (for now), will add in if rebasing on
  top of Paolo's series.

* Used the new qerror system


Jeff Cody (8):
  block: add support functions for live commit, to find and delete
    images.
  block: add live block commit functionality
  blockdev: rename block_stream_cb to a generic block_job_cb
  qerror: new error for live block commit, QERR_TOP_NOT_FOUND
  block: helper function, to find the base image of a chain
  QAPI: add command for live block commit, 'block-commit'
  qemu-iotests: add initial tests for live block commit
  block: after creating a live snapshot, make old image read-only

 QMP/qmp-events.txt         |   6 +-
 block.c                    | 166 ++++++++++++++++++++++++++++++
 block.h                    |   6 +-
 block/Makefile.objs        |   1 +
 block/commit.c             | 247 +++++++++++++++++++++++++++++++++++++++++++++
 block_int.h                |  16 +++
 blockdev.c                 |  76 +++++++++++++-
 qapi-schema.json           |  35 +++++++
 qerror.h                   |   3 +
 qmp-commands.hx            |   6 ++
 tests/qemu-iotests/040     | 142 ++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |   5 +
 tests/qemu-iotests/group   |   1 +
 trace-events               |   4 +-
 14 files changed, 706 insertions(+), 8 deletions(-)
 create mode 100644 block/commit.c
 create mode 100755 tests/qemu-iotests/040
 create mode 100644 tests/qemu-iotests/040.out

-- 
1.7.11.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 1/8] block: add support functions for live commit, to find and delete images.
  2012-09-14 13:41 [Qemu-devel] [PATCH 0/8] Live block commit Jeff Cody
@ 2012-09-14 13:41 ` Jeff Cody
  2012-09-14 15:23   ` Eric Blake
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 2/8] block: add live block commit functionality Jeff Cody
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

Add bdrv_find_overlay(), and bdrv_drop_intermediate().

bdrv_find_overlay():  given 'bs' and the active (topmost) BDS of an image chain,
                    find the image that is the immediate top of 'bs'

bdrv_drop_intermediate():
                    Given 3 BDS (active, top, base), delete images above
                    base up to and including top, and set base to be the
                    parent of top's child node.

                    E.g., this converts:

                    bottom <- base <- intermediate <- top <- active

                    to

                    bottom <- base <- active

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block.c | 150 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 block.h |   5 ++-
 2 files changed, 154 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index bc994df..9549eca 100644
--- a/block.c
+++ b/block.c
@@ -1713,6 +1713,156 @@ int bdrv_change_backing_file(BlockDriverState *bs,
     return ret;
 }
 
+/*
+ * Finds the image layer immediately to the 'top' of bs.
+ *
+ * active is the current topmost image.
+ *
+ * Returns NULL if bs is not found in active's image chain,
+ * or if active == bs.
+ */
+BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
+                                    BlockDriverState *bs)
+{
+    BlockDriverState *overlay = NULL;
+    BlockDriverState *intermediate;
+
+    assert(active != NULL);
+    assert(bs != NULL);
+
+    /* if bs is the same as active, then by definition it has no overlay
+     */
+    if (active == bs) {
+        return NULL;
+    }
+
+    intermediate = active;
+    while (intermediate->backing_hd) {
+        if (intermediate->backing_hd == bs) {
+            overlay = intermediate;
+            break;
+        }
+        intermediate = intermediate->backing_hd;
+    }
+
+    return overlay;
+}
+
+typedef struct BlkIntermediateStates {
+    BlockDriverState *bs;
+    QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
+} BlkIntermediateStates;
+
+
+/*
+ * Deletes images above 'base' up to and including 'top', and sets the image
+ * above 'top' to have base as its backing file.
+ *
+ * Requires that the overlay to 'top' is opened r/w, so that the backing file
+ * information in 'bs' can be properly updated.
+ *
+ * E.g., this will convert the following chain:
+ * bottom <- base <- intermediate <- top <- active
+ *
+ * to
+ *
+ * bottom <- base <- active
+ *
+ * It is allowed for bottom==base, in which case it converts:
+ *
+ * base <- intermediate <- top <- active
+ *
+ * to
+ *
+ * base <- active
+ *
+ * base <- intermediate <- top
+ *
+ * becomes
+ *
+ * base <- top
+ *
+ * Error conditions:
+ *  if active == top, that is considered an error
+ *
+ */
+int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
+                           BlockDriverState *base)
+{
+    BlockDriverState *intermediate;
+    BlockDriverState *base_bs = NULL;
+    BlockDriverState *new_top_bs = NULL;
+    BlkIntermediateStates *intermediate_state, *next;
+    int ret = -EIO;
+
+    QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete;
+    QSIMPLEQ_INIT(&states_to_delete);
+
+    if (!top->drv || !base->drv) {
+        goto exit;
+    }
+
+    new_top_bs = bdrv_find_overlay(active, top);
+
+    if (new_top_bs == NULL) {
+        /* we could not find the image above 'top', this is an error */
+        goto exit;
+    }
+
+    /* special case of new_top_bs->backing_hd already pointing to base - nothing
+     * to do, no intermediate images */
+    if (new_top_bs->backing_hd == base) {
+        ret = 0;
+        goto exit;
+    }
+
+    intermediate = new_top_bs->backing_hd;  /* this should be the same
+                                               as 'top' */
+
+    /* now we will go down through the list, and add each BDS we find
+     * into our deletion queue, until we hit the 'base'
+     */
+    while (intermediate) {
+        intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
+        intermediate_state->bs = intermediate;
+        QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
+
+        if (intermediate->backing_hd == base) {
+            base_bs = intermediate->backing_hd;
+            break;
+        }
+        intermediate = intermediate->backing_hd;
+    }
+    if (base_bs == NULL) {
+        /* something went wrong, we did not end at the base. safely
+         * unravel everything, and exit with error */
+        goto exit;
+    }
+
+    /* success - we can delete the intermediate states, and link top->base */
+    ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
+                                   base_bs->drv ? base_bs->drv->format_name : "");
+    if (ret) {
+        goto exit;
+    }
+    new_top_bs->backing_hd = base_bs;
+
+
+    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
+        /* so that bdrv_close() does not recursively close the chain */
+        intermediate_state->bs->backing_hd = NULL;
+        bdrv_delete(intermediate_state->bs);
+    }
+    ret = 0;
+
+exit:
+    QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
+        g_free(intermediate_state);
+    }
+    return ret;
+}
+
+
 static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
                                    size_t size)
 {
diff --git a/block.h b/block.h
index e51704a..54a074d 100644
--- a/block.h
+++ b/block.h
@@ -209,7 +209,10 @@ int bdrv_commit_all(void);
 int bdrv_change_backing_file(BlockDriverState *bs,
     const char *backing_file, const char *backing_fmt);
 void bdrv_register(BlockDriver *bdrv);
-
+int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
+                           BlockDriverState *base);
+BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
+                                    BlockDriverState *bs);
 
 typedef struct BdrvCheckResult {
     int corruptions;
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 2/8] block: add live block commit functionality
  2012-09-14 13:41 [Qemu-devel] [PATCH 0/8] Live block commit Jeff Cody
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 1/8] block: add support functions for live commit, to find and delete images Jeff Cody
@ 2012-09-14 13:41 ` Jeff Cody
  2012-09-14 15:45   ` Eric Blake
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 3/8] blockdev: rename block_stream_cb to a generic block_job_cb Jeff Cody
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

This adds the live commit coroutine.  This iteration focuses on the
commit only below the active layer, and not the active layer itself.

The behaviour is similar to block streaming; the sectors are walked
through, and anything that exists above 'base' is committed back down
into base.  At the end, intermediate images are deleted, and the
chain stitched together.  Images are restored to their original open
flags upon completion.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block/Makefile.objs |   1 +
 block/commit.c      | 247 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 block_int.h         |  16 ++++
 trace-events        |   2 +
 4 files changed, 266 insertions(+)
 create mode 100644 block/commit.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..4a136b8 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -4,6 +4,7 @@ block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
 block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-obj-y += stream.o
+block-obj-y += commit.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o
 block-obj-$(CONFIG_POSIX) += raw-posix.o
 block-obj-$(CONFIG_LIBISCSI) += iscsi.o
diff --git a/block/commit.c b/block/commit.c
new file mode 100644
index 0000000..72959a3
--- /dev/null
+++ b/block/commit.c
@@ -0,0 +1,247 @@
+/*
+ * Live block commit
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ *  Jeff Cody   <jcody@redhat.com>
+ *  Based on stream.c by Stefan Hajnoczi
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "trace.h"
+#include "block_int.h"
+#include "qemu/ratelimit.h"
+
+enum {
+    /*
+     * Size of data buffer for populating the image file.  This should be large
+     * enough to process multiple clusters in a single call, so that populating
+     * contiguous regions of the image is efficient.
+     */
+    COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
+};
+
+#define SLICE_TIME 100000000ULL /* ns */
+
+typedef struct CommitBlockJob {
+    BlockJob common;
+    RateLimit limit;
+    BlockDriverState *active;
+    BlockDriverState *top;
+    BlockDriverState *base;
+    BlockErrorAction on_error;
+    int base_flags;
+    int orig_overlay_flags;
+} CommitBlockJob;
+
+static int coroutine_fn commit_populate(BlockDriverState *bs,
+                                        BlockDriverState *base,
+                                        int64_t sector_num, int nb_sectors,
+                                        void *buf)
+{
+    int ret = 0;
+
+    ret = bdrv_read(bs, sector_num, buf, nb_sectors);
+    if (ret) {
+        return ret;
+    }
+
+    ret = bdrv_write(base, sector_num, buf, nb_sectors);
+    if (ret) {
+        return ret;
+    }
+
+    return 0;
+}
+
+static void coroutine_fn commit_run(void *opaque)
+{
+    CommitBlockJob *s = opaque;
+    BlockDriverState *active = s->active;
+    BlockDriverState *top = s->top;
+    BlockDriverState *base = s->base;
+    BlockDriverState *overlay_bs = NULL;
+    int64_t sector_num, end;
+    int ret = 0;
+    int n = 0;
+    void *buf;
+    int bytes_written = 0;
+    int64_t base_len;
+
+    ret = s->common.len = bdrv_getlength(top);
+
+
+    if (s->common.len < 0) {
+        goto exit_restore_reopen;
+    }
+
+    ret = base_len = bdrv_getlength(base);
+    if (base_len < 0) {
+        goto exit_restore_reopen;
+    }
+
+    if (base_len < s->common.len) {
+        ret = bdrv_truncate(base, s->common.len);
+        if (ret) {
+            goto exit_restore_reopen;
+        }
+    }
+
+    overlay_bs = bdrv_find_overlay(active, top);
+
+    end = s->common.len >> BDRV_SECTOR_BITS;
+    buf = qemu_blockalign(top, COMMIT_BUFFER_SIZE);
+
+    for (sector_num = 0; sector_num < end; sector_num += n) {
+        uint64_t delay_ns = 0;
+        bool copy;
+
+wait:
+        /* Note that even when no rate limit is applied we need to yield
+         * with no pending I/O here so that qemu_aio_flush() returns.
+         */
+        block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+        if (block_job_is_cancelled(&s->common)) {
+            break;
+        }
+        /* Copy if allocated above the base */
+        ret = bdrv_co_is_allocated_above(top, base, sector_num,
+                                         COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
+                                         &n);
+        copy = (ret == 1);
+        trace_commit_one_iteration(s, sector_num, n, ret);
+        if (copy) {
+            if (s->common.speed) {
+                delay_ns = ratelimit_calculate_delay(&s->limit, n);
+                if (delay_ns > 0) {
+                    goto wait;
+                }
+            }
+            ret = commit_populate(top, base, sector_num, n, buf);
+            bytes_written += n * BDRV_SECTOR_SIZE;
+        }
+        if (ret < 0) {
+            if (s->on_error == BLOCK_ERR_STOP_ANY    ||
+                s->on_error == BLOCK_ERR_REPORT      ||
+                (s->on_error == BLOCK_ERR_STOP_ENOSPC && ret == -ENOSPC)) {
+                goto exit_free_buf;
+            } else {
+                n = 0;
+                continue;
+            }
+        }
+        /* Publish progress */
+        s->common.offset += n * BDRV_SECTOR_SIZE;
+    }
+
+    ret = 0;
+
+    if (!block_job_is_cancelled(&s->common) && sector_num == end) {
+        /* success */
+        ret = bdrv_drop_intermediate(active, top, base);
+    }
+
+exit_free_buf:
+    qemu_vfree(buf);
+
+exit_restore_reopen:
+    /* restore base open flags here if appropriate (e.g., change the base back
+     * to r/o). These reopens do not need to be atomic, since we won't abort
+     * even on failure here */
+    if (s->base_flags != bdrv_get_flags(base)) {
+        bdrv_reopen(base, s->base_flags, NULL);
+    }
+    if (s->orig_overlay_flags != bdrv_get_flags(overlay_bs)) {
+        bdrv_reopen(overlay_bs, s->orig_overlay_flags, NULL);
+    }
+
+    block_job_complete(&s->common, ret);
+}
+
+static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+    CommitBlockJob *s = container_of(job, CommitBlockJob, common);
+
+    if (speed < 0) {
+        error_set(errp, QERR_INVALID_PARAMETER, "speed");
+        return;
+    }
+    ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+}
+
+static BlockJobType commit_job_type = {
+    .instance_size = sizeof(CommitBlockJob),
+    .job_type      = "commit",
+    .set_speed     = commit_set_speed,
+};
+
+void commit_start(BlockDriverState *bs, BlockDriverState *base,
+                 BlockDriverState *top, int64_t speed,
+                 BlockErrorAction on_error, BlockDriverCompletionFunc *cb,
+                 void *opaque, Error **errp)
+{
+    CommitBlockJob *s;
+    BlockReopenQueue *reopen_queue = NULL;
+    int orig_overlay_flags;
+    int orig_base_flags;
+    BlockDriverState *overlay_bs;
+    Error *local_err = NULL;
+
+    if ((on_error == BLOCK_ERR_STOP_ANY ||
+         on_error == BLOCK_ERR_STOP_ENOSPC) &&
+        !bdrv_iostatus_is_enabled(bs)) {
+        error_set(errp, QERR_INVALID_PARAMETER_COMBINATION);
+        return;
+    }
+
+    overlay_bs = bdrv_find_overlay(bs, top);
+
+    if (overlay_bs == NULL) {
+        error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+                 "Could not find overlay image for %s:", top->filename);
+        return;
+    }
+
+    orig_base_flags    = bdrv_get_flags(base);
+    orig_overlay_flags = bdrv_get_flags(overlay_bs);
+
+    /* convert base_bs & overlay_bs to r/w, if necessary */
+    if (!(orig_base_flags & BDRV_O_RDWR)) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, base,
+                                         orig_base_flags | BDRV_O_RDWR);
+    }
+    if (!(orig_overlay_flags & BDRV_O_RDWR)) {
+        reopen_queue = bdrv_reopen_queue(reopen_queue, overlay_bs,
+                                         orig_overlay_flags | BDRV_O_RDWR);
+    }
+    if (reopen_queue) {
+        bdrv_reopen_multiple(reopen_queue, &local_err);
+        if (local_err != NULL) {
+            error_propagate(errp, local_err);
+            return;
+        }
+    }
+
+
+    s = block_job_create(&commit_job_type, bs, speed, cb, opaque, errp);
+    if (!s) {
+        return;
+    }
+
+    s->base   = base;
+    s->top    = top;
+    s->active = bs;
+
+    s->base_flags          = orig_base_flags;
+    s->orig_overlay_flags  = orig_overlay_flags;
+
+    s->on_error = on_error;
+    s->common.co = qemu_coroutine_create(commit_run);
+
+    trace_commit_start(bs, base, top, s, s->common.co, opaque);
+    qemu_coroutine_enter(s->common.co, s);
+}
diff --git a/block_int.h b/block_int.h
index e533ca4..1d44e5d 100644
--- a/block_int.h
+++ b/block_int.h
@@ -462,4 +462,20 @@ void stream_start(BlockDriverState *bs, BlockDriverState *base,
                   BlockDriverCompletionFunc *cb,
                   void *opaque, Error **errp);
 
+/**
+ * commit_start:
+ * @bs: Top Block device
+ * @base: Block device that will be written into, and become the new top
+ * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @on_error: The action to take upon error.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @errp: Error object.
+ *
+ */
+void commit_start(BlockDriverState *bs, BlockDriverState *base,
+                 BlockDriverState *top, int64_t speed,
+                 BlockErrorAction on_error, BlockDriverCompletionFunc *cb,
+                 void *opaque, Error **errp);
+
 #endif /* BLOCK_INT_H */
diff --git a/trace-events b/trace-events
index b25ae1c..98a1f97 100644
--- a/trace-events
+++ b/trace-events
@@ -74,6 +74,8 @@ bdrv_co_do_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t c
 # block/stream.c
 stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
 stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base %p s %p co %p opaque %p"
+commit_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
+commit_start(void *bs, void *base, void *top, void *s, void *co, void *opaque) "bs %p base %p top %p s %p co %p opaque %p"
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 3/8] blockdev: rename block_stream_cb to a generic block_job_cb
  2012-09-14 13:41 [Qemu-devel] [PATCH 0/8] Live block commit Jeff Cody
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 1/8] block: add support functions for live commit, to find and delete images Jeff Cody
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 2/8] block: add live block commit functionality Jeff Cody
@ 2012-09-14 13:41 ` Jeff Cody
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 4/8] qerror: new error for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak


Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockdev.c   | 6 +++---
 trace-events | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index 7c83baa..9caa16f 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -1063,12 +1063,12 @@ static QObject *qobject_from_block_job(BlockJob *job)
                               job->speed);
 }
 
-static void block_stream_cb(void *opaque, int ret)
+static void block_job_cb(void *opaque, int ret)
 {
     BlockDriverState *bs = opaque;
     QObject *obj;
 
-    trace_block_stream_cb(bs, bs->job, ret);
+    trace_block_job_cb(bs, bs->job, ret);
 
     assert(bs->job);
     obj = qobject_from_block_job(bs->job);
@@ -1110,7 +1110,7 @@ void qmp_block_stream(const char *device, bool has_base,
     }
 
     stream_start(bs, base_bs, base, has_speed ? speed : 0,
-                 block_stream_cb, bs, &local_err);
+                 block_job_cb, bs, &local_err);
     if (error_is_set(&local_err)) {
         error_propagate(errp, local_err);
         return;
diff --git a/trace-events b/trace-events
index 98a1f97..f84b1af 100644
--- a/trace-events
+++ b/trace-events
@@ -79,7 +79,7 @@ commit_start(void *bs, void *base, void *top, void *s, void *co, void *opaque) "
 
 # blockdev.c
 qmp_block_job_cancel(void *job) "job %p"
-block_stream_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
+block_job_cb(void *bs, void *job, int ret) "bs %p job %p ret %d"
 qmp_block_stream(void *bs, void *job) "bs %p job %p"
 
 # hw/virtio-blk.c
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 4/8] qerror: new error for live block commit, QERR_TOP_NOT_FOUND
  2012-09-14 13:41 [Qemu-devel] [PATCH 0/8] Live block commit Jeff Cody
                   ` (2 preceding siblings ...)
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 3/8] blockdev: rename block_stream_cb to a generic block_job_cb Jeff Cody
@ 2012-09-14 13:41 ` Jeff Cody
  2012-09-14 16:01   ` Eric Blake
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 5/8] block: helper function, to find the base image of a chain Jeff Cody
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak


Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 qerror.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/qerror.h b/qerror.h
index d0a76a4..7396184 100644
--- a/qerror.h
+++ b/qerror.h
@@ -219,6 +219,9 @@ void assert_no_error(Error *err);
 #define QERR_TOO_MANY_FILES \
     ERROR_CLASS_GENERIC_ERROR, "Too many open files"
 
+#define QERR_TOP_NOT_FOUND \
+    ERROR_CLASS_GENERIC_ERROR, "Top image file %s not found"
+
 #define QERR_UNDEFINED_ERROR \
     ERROR_CLASS_GENERIC_ERROR, "An undefined error has occurred"
 
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 5/8] block: helper function, to find the base image of a chain
  2012-09-14 13:41 [Qemu-devel] [PATCH 0/8] Live block commit Jeff Cody
                   ` (3 preceding siblings ...)
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 4/8] qerror: new error for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
@ 2012-09-14 13:41 ` Jeff Cody
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 6/8] QAPI: add command for live block commit, 'block-commit' Jeff Cody
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

This is a simple helper function, that will return the base image
of a given image chain.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 block.c | 16 ++++++++++++++++
 block.h |  1 +
 2 files changed, 17 insertions(+)

diff --git a/block.c b/block.c
index 9549eca..f19c1dd 100644
--- a/block.c
+++ b/block.c
@@ -3113,6 +3113,22 @@ int bdrv_get_backing_file_depth(BlockDriverState *bs)
     return 1 + bdrv_get_backing_file_depth(bs->backing_hd);
 }
 
+BlockDriverState *bdrv_find_base(BlockDriverState *bs)
+{
+    BlockDriverState *curr_bs = NULL;
+
+    if (!bs) {
+        return NULL;
+    }
+
+    curr_bs = bs;
+
+    while (curr_bs->backing_hd) {
+        curr_bs = curr_bs->backing_hd;
+    }
+    return curr_bs;
+}
+
 #define NB_SUFFIXES 4
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size)
diff --git a/block.h b/block.h
index 54a074d..c256211 100644
--- a/block.h
+++ b/block.h
@@ -213,6 +213,7 @@ int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
                            BlockDriverState *base);
 BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
                                     BlockDriverState *bs);
+BlockDriverState *bdrv_find_base(BlockDriverState *bs);
 
 typedef struct BdrvCheckResult {
     int corruptions;
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 6/8] QAPI: add command for live block commit, 'block-commit'
  2012-09-14 13:41 [Qemu-devel] [PATCH 0/8] Live block commit Jeff Cody
                   ` (4 preceding siblings ...)
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 5/8] block: helper function, to find the base image of a chain Jeff Cody
@ 2012-09-14 13:41 ` Jeff Cody
  2012-09-15  1:05   ` Eric Blake
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 7/8] qemu-iotests: add initial tests for live block commit Jeff Cody
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 8/8] block: after creating a live snapshot, make old image read-only Jeff Cody
  7 siblings, 1 reply; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

The command for live block commit is added, which has the following
arguments:

device: the block device to perform the commit on (mandatory)
base:   the base image to commit into; optional (if not specified,
        it is the underlying original image)
top:    the top image of the commit - all data from inside top down
        to base will be committed into base. optional (if not specified,
        it is the active image) - see note below
speed:  maximum speed, in bytes/sec
on_error: action to take on error (optional - default is report)

note: eventually this will support merging down the active layer,
      but that code is not yet complete.  If the active layer is passed
      in currently as top, or top is left to the default, then the error
      QERR_TOP_NOT_FOUND will be returned.

The is done as a block job, so upon completion a BLOCK_JOB_COMPLETED will
be emitted.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 QMP/qmp-events.txt |  6 +++--
 blockdev.c         | 65 +++++++++++++++++++++++++++++++++++++++++++++++++++++-
 qapi-schema.json   | 35 +++++++++++++++++++++++++++++
 qmp-commands.hx    |  6 +++++
 4 files changed, 109 insertions(+), 3 deletions(-)

diff --git a/QMP/qmp-events.txt b/QMP/qmp-events.txt
index 2878058..4491020 100644
--- a/QMP/qmp-events.txt
+++ b/QMP/qmp-events.txt
@@ -50,7 +50,8 @@ Emitted when a block job has been cancelled.
 
 Data:
 
-- "type":     Job type ("stream" for image streaming, json-string)
+- "type":     Job type (json-string; "stream" for image streaming
+                                     "commit" for block commit)
 - "device":   Device name (json-string)
 - "len":      Maximum progress value (json-int)
 - "offset":   Current progress value (json-int)
@@ -73,7 +74,8 @@ Emitted when a block job has completed.
 
 Data:
 
-- "type":     Job type ("stream" for image streaming, json-string)
+- "type":     Job type (json-string; "stream" for image streaming
+                                     "commit" for block commit)
 - "device":   Device name (json-string)
 - "len":      Maximum progress value (json-int)
 - "offset":   Current progress value (json-int)
diff --git a/blockdev.c b/blockdev.c
index 9caa16f..785e999 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -825,7 +825,6 @@ exit:
     return;
 }
 
-
 static void eject_device(BlockDriverState *bs, int force, Error **errp)
 {
     if (bdrv_in_use(bs)) {
@@ -1124,6 +1123,70 @@ void qmp_block_stream(const char *device, bool has_base,
     trace_qmp_block_stream(bs, bs->job);
 }
 
+void qmp_block_commit(const char *device,
+                      bool has_base, const char *base,
+                      bool has_top, const char *top,
+                      bool has_speed, int64_t speed,
+                      Error **errp)
+{
+    BlockDriverState *bs;
+    BlockDriverState *base_bs, *top_bs;
+    Error *local_err = NULL;
+    /* This will be part of the QMP command, if/when the
+     * BlockdevOnError change for blkmirror makes it in
+     */
+    BlockErrorAction on_error = BLOCK_ERR_REPORT;
+
+    /* drain all i/o before commits */
+    bdrv_drain_all();
+
+    bs = bdrv_find(device);
+    if (!bs) {
+        error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+        return;
+    }
+    if (base && has_base) {
+        base_bs = bdrv_find_backing_image(bs, base);
+    } else {
+        base_bs = bdrv_find_base(bs);
+    }
+
+    if (base_bs == NULL) {
+        error_set(errp, QERR_BASE_NOT_FOUND, base ? base : "NULL");
+        return;
+    }
+
+    if (top && has_top) {
+        /* if we want to allow the active layer,
+         * use 'bdrv_find_image()' here */
+        top_bs = bdrv_find_backing_image(bs, top);
+    } else {
+        /* default is one below the active layer, unless that is
+         * the base */
+        top_bs = bs->backing_hd;
+        if (top_bs == base_bs) {
+            error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+                      "Invalid files for merge: top and base are the same");
+            return;
+        }
+    }
+    if (top_bs == NULL) {
+        error_set(errp, QERR_TOP_NOT_FOUND, top ? top : "NULL");
+        return;
+    }
+
+    commit_start(bs, base_bs, top_bs, speed, on_error, block_job_cb, bs,
+                &local_err);
+    if (local_err != NULL) {
+        error_propagate(errp, local_err);
+        return;
+    }
+    /* Grab a reference so hotplug does not delete the BlockDriverState from
+     * underneath us.
+     */
+    drive_get_ref(drive_get_by_blockdev(bs));
+}
+
 static BlockJob *find_block_job(const char *device)
 {
     BlockDriverState *bs;
diff --git a/qapi-schema.json b/qapi-schema.json
index a9f465a..c99ccfa 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1404,6 +1404,41 @@
   'returns': 'str' }
 
 ##
+# @block-commit
+#
+# Live commit of data from child image nodes into parent nodes - i.e.,
+# writes data between 'top' and 'base' into 'base'.
+#
+# @device:  the name of the device
+#
+# @base:   #optional The file name of the parent image of the device to write
+#                    data into.  If not specified, this is the original parent
+#                    image.
+#
+# @top:    #optional The file name of the child image, above which data will
+#                    not be committed down.  If not specified, this is one
+#                    layer below the active layer (i.e. active->backing_hd).
+#
+#                    If top == base, that is an error.
+#
+#
+# @speed:  #optional the maximum speed, in bytes per second
+#
+# Returns: Nothing on success
+#          If commit or stream is already active on this device, DeviceInUse
+#          If @device does not exist, DeviceNotFound
+#          If image commit is not supported by this device, NotSupported
+#          If @base does not exist, BaseNotFound
+#          If @top does not exist, TopNotFound
+#          If @speed is invalid, InvalidParameter
+#
+# Since: 1.3
+#
+##
+{ 'command': 'block-commit',
+  'data': { 'device': 'str', '*base': 'str', '*top': 'str',
+            '*speed': 'int' } }
+
 # @migrate_cancel
 #
 # Cancel the current executing migration process.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 6e21ddb..e244763 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -792,6 +792,12 @@ EQMP
     },
 
     {
+        .name       = "block-commit",
+        .args_type  = "device:B,base:s?,top:s?,speed:o?",
+        .mhandler.cmd_new = qmp_marshal_input_block_commit,
+    },
+
+    {
         .name       = "block-job-set-speed",
         .args_type  = "device:B,speed:o",
         .mhandler.cmd_new = qmp_marshal_input_block_job_set_speed,
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 7/8] qemu-iotests: add initial tests for live block commit
  2012-09-14 13:41 [Qemu-devel] [PATCH 0/8] Live block commit Jeff Cody
                   ` (5 preceding siblings ...)
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 6/8] QAPI: add command for live block commit, 'block-commit' Jeff Cody
@ 2012-09-14 13:41 ` Jeff Cody
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 8/8] block: after creating a live snapshot, make old image read-only Jeff Cody
  7 siblings, 0 replies; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

Derived from the streaming test cases (030), this adds the
following tests:

1. For the following image chain, commit [mid] into [backing],
   and use qemu-io to verify [backing] has its original data, as
   well as the data from [mid]

           [backing] <-- [mid] <-- [test]

2. Verifies that 'block-commit' with the 'speed' parameter sets the
   speed parameter, as reported by 'query-block-jobs'

3. Verifies that a bogus 'device' parameter to 'block-commit'
   results in error

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 tests/qemu-iotests/040     | 142 +++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/040.out |   5 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 148 insertions(+)
 create mode 100755 tests/qemu-iotests/040
 create mode 100644 tests/qemu-iotests/040.out

diff --git a/tests/qemu-iotests/040 b/tests/qemu-iotests/040
new file mode 100755
index 0000000..6b89a41
--- /dev/null
+++ b/tests/qemu-iotests/040
@@ -0,0 +1,142 @@
+#!/usr/bin/env python
+#
+# Tests for image block commit.
+#
+# Copyright (C) 2012 IBM, Corp.
+# Copyright (C) 2012 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+# Test for live block commit
+# Derived from Image Streaming Test 030
+
+import time
+import os
+import iotests
+from iotests import qemu_img, qemu_io
+import struct
+
+backing_img = os.path.join(iotests.test_dir, 'backing.img')
+mid_img = os.path.join(iotests.test_dir, 'mid.img')
+test_img = os.path.join(iotests.test_dir, 'test.img')
+
+class ImageCommitTestCase(iotests.QMPTestCase):
+    '''Abstract base class for image commit test cases'''
+
+    def assert_no_active_commit(self):
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return', [])
+
+    def cancel_and_wait(self, drive='drive0'):
+        '''Cancel a block job and wait for it to finish'''
+        result = self.vm.qmp('block-job-cancel', device=drive)
+        self.assert_qmp(result, 'return', {})
+
+        cancelled = False
+        while not cancelled:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_CANCELLED':
+                    self.assert_qmp(event, 'data/type', 'commit')
+                    self.assert_qmp(event, 'data/device', drive)
+                    cancelled = True
+
+        self.assert_no_active_commit()
+
+    def create_image(self, name, size):
+        file = open(name, 'w')
+        i = 0
+        while i < size:
+            sector = struct.pack('>l504xl', i / 512, i / 512)
+            file.write(sector)
+            i = i + 512
+        file.close()
+
+
+class TestSingleDrive(ImageCommitTestCase):
+    image_len = 1 * 1024 * 1024
+    test_len = 1 * 1024 * 256
+
+    def setUp(self):
+        self.create_image(backing_img, TestSingleDrive.image_len)
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, mid_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % mid_img, test_img)
+        qemu_io('-c', 'write -P 0xab 0 524288', backing_img)
+        qemu_io('-c', 'write -P 0xef 524288 524288', mid_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(mid_img)
+        os.remove(backing_img)
+
+    def test_commit(self):
+        self.assert_no_active_commit()
+        result = self.vm.qmp('block-commit', device='drive0')
+        self.assert_qmp(result, 'return', {})
+
+        completed = False
+        while not completed:
+            for event in self.vm.get_qmp_events(wait=True):
+                if event['event'] == 'BLOCK_JOB_COMPLETED':
+                    self.assert_qmp(event, 'data/type', 'commit')
+                    self.assert_qmp(event, 'data/device', 'drive0')
+                    self.assert_qmp(event, 'data/offset', self.image_len)
+                    self.assert_qmp(event, 'data/len', self.image_len)
+                    completed = True
+
+        self.assert_no_active_commit()
+        self.vm.shutdown()
+
+        self.assertEqual(-1, qemu_io('-c', 'read -P 0xab 0 524288', backing_img).find("verification failed"))
+        self.assertEqual(-1, qemu_io('-c', 'read -P 0xef 524288 524288', backing_img).find("verification failed"))
+
+    def test_device_not_found(self):
+        result = self.vm.qmp('block-commit', device='nonexistent')
+        self.assert_qmp(result, 'error/class', 'DeviceNotFound')
+
+
+class TestSetSpeed(ImageCommitTestCase):
+    image_len = 80 * 1024 * 1024 # MB
+
+    def setUp(self):
+        qemu_img('create', backing_img, str(TestSetSpeed.image_len))
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % backing_img, mid_img)
+        qemu_img('create', '-f', iotests.imgfmt, '-o', 'backing_file=%s' % mid_img, test_img)
+        self.vm = iotests.VM().add_drive(test_img)
+        self.vm.launch()
+
+    def tearDown(self):
+        self.vm.shutdown()
+        os.remove(test_img)
+        os.remove(mid_img)
+        os.remove(backing_img)
+
+    def test_set_speed(self):
+        self.assert_no_active_commit()
+
+        result = self.vm.qmp('block-commit', device='drive0', top=mid_img, speed=1024 * 1024)
+        self.assert_qmp(result, 'return', {})
+
+        # Ensure the speed we set was accepted
+        result = self.vm.qmp('query-block-jobs')
+        self.assert_qmp(result, 'return[0]/device', 'drive0')
+        self.assert_qmp(result, 'return[0]/speed', 1024 * 1024)
+
+        self.cancel_and_wait()
+
+
+if __name__ == '__main__':
+    iotests.main(supported_fmts=['qcow2', 'qed'])
diff --git a/tests/qemu-iotests/040.out b/tests/qemu-iotests/040.out
new file mode 100644
index 0000000..8d7e996
--- /dev/null
+++ b/tests/qemu-iotests/040.out
@@ -0,0 +1,5 @@
+...
+----------------------------------------------------------------------
+Ran 3 tests
+
+OK
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index ebb5ca4..4b54fa6 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -46,3 +46,4 @@
 037 rw auto backing
 038 rw auto backing
 039 rw auto
+040 rw auto
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [Qemu-devel] [PATCH 8/8] block: after creating a live snapshot, make old image read-only
  2012-09-14 13:41 [Qemu-devel] [PATCH 0/8] Live block commit Jeff Cody
                   ` (6 preceding siblings ...)
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 7/8] qemu-iotests: add initial tests for live block commit Jeff Cody
@ 2012-09-14 13:41 ` Jeff Cody
  7 siblings, 0 replies; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 13:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, eblake, supriyak

Currently, after a live snapshot of a drive, the image that has
been 'demoted' to be below the new active layer remains r/w.
This patch reopens it read-only.

Note that we do not check for error on the reopen(), because we
will not abort the snapshots if the reopen fails.

This patch depends on the bdrv_reopen() series.

Signed-off-by: Jeff Cody <jcody@redhat.com>
---
 blockdev.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/blockdev.c b/blockdev.c
index 785e999..6330327 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -803,6 +803,11 @@ void qmp_transaction(BlockdevActionList *dev_list, Error **errp)
     QSIMPLEQ_FOREACH(states, &snap_bdrv_states, entry) {
         /* This removes our old bs from the bdrv_states, and adds the new bs */
         bdrv_append(states->new_bs, states->old_bs);
+        /* We don't need (or want) to use the transactional
+         * bdrv_reopen_multiple() across all the entries at once, because we
+         * don't want to abort all of them if one of them fails the reopen */
+        bdrv_reopen(states->new_bs, states->new_bs->open_flags & ~BDRV_O_RDWR,
+                    NULL);
     }
 
     /* success */
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] block: add support functions for live commit, to find and delete images.
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 1/8] block: add support functions for live commit, to find and delete images Jeff Cody
@ 2012-09-14 15:23   ` Eric Blake
  2012-09-14 15:39     ` Jeff Cody
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2012-09-14 15:23 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

[-- Attachment #1: Type: text/plain, Size: 1907 bytes --]

On 09/14/2012 07:41 AM, Jeff Cody wrote:
> Add bdrv_find_overlay(), and bdrv_drop_intermediate().
> 
> bdrv_find_overlay():  given 'bs' and the active (topmost) BDS of an image chain,
>                     find the image that is the immediate top of 'bs'
> 
> bdrv_drop_intermediate():
>                     Given 3 BDS (active, top, base), delete images above

s/delete/drop/

>                     base up to and including top, and set base to be the
>                     parent of top's child node.

set base to be the backing file of top's overlay node.

> 
>                     E.g., this converts:
> 
>                     bottom <- base <- intermediate <- top <- active
> 
>                     to
> 
>                     bottom <- base <- active
> 
> +++ b/block.c
> @@ -1713,6 +1713,156 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>      return ret;
>  }
>  
> +/*
> + * Finds the image layer immediately to the 'top' of bs.

Or even:

Finds the image layer in the chain that has 'bs' as its backing file.

> +/*
> + * Deletes images above 'base' up to and including 'top', and sets the image

s/Deletes/Drops/

> +++ b/block.h
> @@ -209,7 +209,10 @@ int bdrv_commit_all(void);
>  int bdrv_change_backing_file(BlockDriverState *bs,
>      const char *backing_file, const char *backing_fmt);
>  void bdrv_register(BlockDriver *bdrv);
> -
> +int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
> +                           BlockDriverState *base);
> +BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
> +                                    BlockDriverState *bs);
>  
>  typedef struct BdrvCheckResult {

Changed from two blank lines to one before the typedef; was that
intentional?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 1/8] block: add support functions for live commit, to find and delete images.
  2012-09-14 15:23   ` Eric Blake
@ 2012-09-14 15:39     ` Jeff Cody
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 15:39 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

On 09/14/2012 11:23 AM, Eric Blake wrote:
> On 09/14/2012 07:41 AM, Jeff Cody wrote:
>> Add bdrv_find_overlay(), and bdrv_drop_intermediate().
>>
>> bdrv_find_overlay():  given 'bs' and the active (topmost) BDS of an image chain,
>>                     find the image that is the immediate top of 'bs'
>>
>> bdrv_drop_intermediate():
>>                     Given 3 BDS (active, top, base), delete images above
> 
> s/delete/drop/
> 

Thanks

>>                     base up to and including top, and set base to be the
>>                     parent of top's child node.
> 
> set base to be the backing file of top's overlay node.
> 

Thanks

>>
>>                     E.g., this converts:
>>
>>                     bottom <- base <- intermediate <- top <- active
>>
>>                     to
>>
>>                     bottom <- base <- active
>>
>> +++ b/block.c
>> @@ -1713,6 +1713,156 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>>      return ret;
>>  }
>>  
>> +/*
>> + * Finds the image layer immediately to the 'top' of bs.
> 
> Or even:
> 
> Finds the image layer in the chain that has 'bs' as its backing file.
> 

Thanks - I prefer your wording, that is clearer.


>> +/*
>> + * Deletes images above 'base' up to and including 'top', and sets the image
> 
> s/Deletes/Drops/
> 
>> +++ b/block.h
>> @@ -209,7 +209,10 @@ int bdrv_commit_all(void);
>>  int bdrv_change_backing_file(BlockDriverState *bs,
>>      const char *backing_file, const char *backing_fmt);
>>  void bdrv_register(BlockDriver *bdrv);
>> -
>> +int bdrv_drop_intermediate(BlockDriverState *active, BlockDriverState *top,
>> +                           BlockDriverState *base);
>> +BlockDriverState *bdrv_find_overlay(BlockDriverState *active,
>> +                                    BlockDriverState *bs);
>>  
>>  typedef struct BdrvCheckResult {
> 
> Changed from two blank lines to one before the typedef; was that
> intentional?
>

Nope, it was unintentional - thanks

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] block: add live block commit functionality
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 2/8] block: add live block commit functionality Jeff Cody
@ 2012-09-14 15:45   ` Eric Blake
  2012-09-14 16:07     ` Jeff Cody
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2012-09-14 15:45 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

[-- Attachment #1: Type: text/plain, Size: 2270 bytes --]

On 09/14/2012 07:41 AM, Jeff Cody wrote:
> This adds the live commit coroutine.  This iteration focuses on the
> commit only below the active layer, and not the active layer itself.
> 
> The behaviour is similar to block streaming; the sectors are walked
> through, and anything that exists above 'base' is committed back down
> into base.  At the end, intermediate images are deleted, and the
> chain stitched together.  Images are restored to their original open
> flags upon completion.
> 

> +
> +enum {
> +    /*
> +     * Size of data buffer for populating the image file.  This should be large
> +     * enough to process multiple clusters in a single call, so that populating
> +     * contiguous regions of the image is efficient.
> +     */
> +    COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */

I'm guessing you will add a followup patch that depends on Paolo's
series for controlling the granularity of this buffer?  Or is it less
important for the commit case?

> +
> +static void coroutine_fn commit_run(void *opaque)
> +{

> +    ret = base_len = bdrv_getlength(base);
> +    if (base_len < 0) {
> +        goto exit_restore_reopen;
> +    }
> +
> +    if (base_len < s->common.len) {
> +        ret = bdrv_truncate(base, s->common.len);
> +        if (ret) {
> +            goto exit_restore_reopen;
> +        }
> +    }

Question: is it valid to have a qcow2 file whose size is smaller than
it's backing image?  Suppose I have base[1M] <- mid[2M] <- top[3M] <-
active[3M], and request to commit top into base.  This bdrv_truncate()
means I will now have:

base[3M] <- mid[2M] <- top[3M] <- active[3M].

If I then abort the commit operation at this point, then we have the
situation of 'mid' reporting a smaller size than 'base' - which may make
'mid' invalid.  And even if it is valid, what happens if I now request
to commit 'mid' into 'base', but 'base' already had data written past
the 2M mark before I aborted the first operation?

I'm worried that you may have to bdrv_truncate() the entire chain to
keep it consistent, which is more complex because it requires more r/w
files.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 4/8] qerror: new error for live block commit, QERR_TOP_NOT_FOUND
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 4/8] qerror: new error for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
@ 2012-09-14 16:01   ` Eric Blake
  0 siblings, 0 replies; 18+ messages in thread
From: Eric Blake @ 2012-09-14 16:01 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

[-- Attachment #1: Type: text/plain, Size: 820 bytes --]

On 09/14/2012 07:41 AM, Jeff Cody wrote:
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  qerror.h | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/qerror.h b/qerror.h
> index d0a76a4..7396184 100644
> --- a/qerror.h
> +++ b/qerror.h
> @@ -219,6 +219,9 @@ void assert_no_error(Error *err);
>  #define QERR_TOO_MANY_FILES \
>      ERROR_CLASS_GENERIC_ERROR, "Too many open files"
>  
> +#define QERR_TOP_NOT_FOUND \
> +    ERROR_CLASS_GENERIC_ERROR, "Top image file %s not found"
> +

Is this patch really still needed, now that the generic error handling
is in?  Shouldn't you instead be using error_setg() in patch 6/8 for the
one place that prints this error message?

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] block: add live block commit functionality
  2012-09-14 15:45   ` Eric Blake
@ 2012-09-14 16:07     ` Jeff Cody
  2012-09-14 18:23       ` Eric Blake
  0 siblings, 1 reply; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 16:07 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

On 09/14/2012 11:45 AM, Eric Blake wrote:
> On 09/14/2012 07:41 AM, Jeff Cody wrote:
>> This adds the live commit coroutine.  This iteration focuses on the
>> commit only below the active layer, and not the active layer itself.
>>
>> The behaviour is similar to block streaming; the sectors are walked
>> through, and anything that exists above 'base' is committed back down
>> into base.  At the end, intermediate images are deleted, and the
>> chain stitched together.  Images are restored to their original open
>> flags upon completion.
>>
> 
>> +
>> +enum {
>> +    /*
>> +     * Size of data buffer for populating the image file.  This should be large
>> +     * enough to process multiple clusters in a single call, so that populating
>> +     * contiguous regions of the image is efficient.
>> +     */
>> +    COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
> 
> I'm guessing you will add a followup patch that depends on Paolo's
> series for controlling the granularity of this buffer?  Or is it less
> important for the commit case?

For the version of commit implemented by these patches, I don't think
controlling the granularity of the commit buffer is important.  However,
for the next stage, when we are commiting the active layer, then it may
become more important.  That stage will likely have a lot of code reuse
from Paolo's series, as well.

> 
>> +
>> +static void coroutine_fn commit_run(void *opaque)
>> +{
> 
>> +    ret = base_len = bdrv_getlength(base);
>> +    if (base_len < 0) {
>> +        goto exit_restore_reopen;
>> +    }
>> +
>> +    if (base_len < s->common.len) {
>> +        ret = bdrv_truncate(base, s->common.len);
>> +        if (ret) {
>> +            goto exit_restore_reopen;
>> +        }
>> +    }
> 
> Question: is it valid to have a qcow2 file whose size is smaller than
> it's backing image?

I don't think so... however:

>  Suppose I have base[1M] <- mid[2M] <- top[3M] <-
> active[3M], and request to commit top into base.  This bdrv_truncate()
> means I will now have:
> 
> base[3M] <- mid[2M] <- top[3M] <- active[3M].
> 
> If I then abort the commit operation at this point, then we have the
> situation of 'mid' reporting a smaller size than 'base' - which may make
> 'mid' invalid.  And even if it is valid, what happens if I now request
> to commit 'mid' into 'base', but 'base' already had data written past
> the 2M mark before I aborted the first operation?

Once the commit starts, I don't know if you can safely abort it, and
still count on 'mid' being valid.  Ignoring potential size differences,
how would you ever know that what was written from 'top' into 'base' is
compatible with what is present in 'mid'?

Once you begin a commit, your chain as an entirety can stay safe after
an abort, as long as it is accessed from 'top' or above... but I think
you have to consider the intermediate images between 'base' and 'top' to
be invalid as standalone images.

> 
> I'm worried that you may have to bdrv_truncate() the entire chain to
> keep it consistent, which is more complex because it requires more r/w
> files.
> 

Transactional bdrv_truncate()!  :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] block: add live block commit functionality
  2012-09-14 16:07     ` Jeff Cody
@ 2012-09-14 18:23       ` Eric Blake
  2012-09-14 20:29         ` Jeff Cody
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2012-09-14 18:23 UTC (permalink / raw)
  To: jcody; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

[-- Attachment #1: Type: text/plain, Size: 6112 bytes --]

On 09/14/2012 10:07 AM, Jeff Cody wrote:
>> Question: is it valid to have a qcow2 file whose size is smaller than
>> it's backing image?
> 
> I don't think so... however:
> 
>>  Suppose I have base[1M] <- mid[2M] <- top[3M] <-
>> active[3M], and request to commit top into base.  This bdrv_truncate()
>> means I will now have:
>>
>> base[3M] <- mid[2M] <- top[3M] <- active[3M].
>>
>> If I then abort the commit operation at this point, then we have the
>> situation of 'mid' reporting a smaller size than 'base' - which may make
>> 'mid' invalid.  And even if it is valid, what happens if I now request
>> to commit 'mid' into 'base', but 'base' already had data written past
>> the 2M mark before I aborted the first operation?
> 
> Once the commit starts, I don't know if you can safely abort it, and
> still count on 'mid' being valid.  Ignoring potential size differences,
> how would you ever know that what was written from 'top' into 'base' is
> compatible with what is present in 'mid'?

We chatted about this some more on IRC, and I'll attempt to summarize
the results of that conversation (correct me if I'm wrong)...

When committing across multiple images, there are four allocation cases
to consider:

1. unallocated in mid or top => nothing to do; base is already correct

2. allocated in mid but not top => copy from mid to base; as long as mid
is in the chain, both mid and top see the version in mid; as soon as mid
is removed from the chain, top sees the version in base

3. allocated in mid and in top => ultimately, we want to copy from top
to base.  We can also do an intermediate copy from mid to base, although
that is less efficient; as long as the copy from top to base happens
last.  As long as the sector remains allocated, then mid always sees its
own version, and top always sees its own version.

4. allocated in top but not mid => we want to copy from top to base, but
the moment we do that, if mid is still in the chain, then we have
invalidated the contents of mid.  However, as long as top remains
allocated, it sees its own version, and even if top is marked
unallocated, it would then see through to base and see correct contents
even though the intermediate file mid is inconsistent.

Use of block-commit has the potential to invalidate all images that are
dropped from the chain (namely, any time allocation scenario 4 is
present anywhere in the image); it is up to users to avoid using commit
if they have any other image chain sharing the part of the chain
discarded by this operation (someday, libvirt might track all storage
chains, and be able to prevent an attempt at a commit if it would strand
someone else's chain; but for now, we just document the issue).

Next, there is a question of whether invalidating the image up front is
acceptable, or whether we must go through gyrations to avoid
invalidation until after the image has been dropped from the chain.
That is, does the invalidation happen the moment the commit starts (and
can't be undone by an early abort), or can it be delayed until the point
that the image is actually dropped from the chain.  As long as the
current running qemu is the only entity using the portion of the chain
being dropped, then the timing does not matter, other than affecting
what optimizations we might be able to perform.

There is also a question of what happens if a commit is started, then
aborted, then restarted.  It is always safe to restart the same commit
from scratch, just not optimal, as the later run will spend time copying
identical content that was already in base on the first run.  The only
way to avoid copying sectors on a second run is to mark them unallocated
on the first run, but then we have the issue of consistency: if a sector
is allocated in both mid and top (scenario 3), and the first run copies
top into base and then marks top unallocated, then a future read of top
would pick up the contents from mid, which is wrong.  Therefore, we
cannot mark sectors unallocated unless we traverse them in a safe order.

I was able to come up with an algorithm that allows for faster restarts
of a commit operation, in order to avoid copying any sector into base
more than once (at least, insofar as top is not also an active image,
but we already deferred committing an active image for a later date).
It requires that every image being trimmed from the chain be r/w
(although only one image has to be r/w at a time), and that the copies
be done in a depth-first manner.  That is, the algorithm first visits
all allocted sectors in 'mid'; if they are not also allocated in top,
then the sector is copied into base and marked unallocated in mid.  When
mid is completed, it is removed from the chain, before proceeding to
top.  Eventually, all sectors will be copied into base, exactly once,
and the algorithm is restartable because it marks sectors unallocated
once base has the correct contents.  But it is more complex to implement.

In conclusion, since this stage of the implementation never marks
sectors unallocated, the use of the top of the chain is never
invalidated even if intermediate files remain in the chain but have
already been invalidated.  I'm okay with this patch going in as a first
approximation, and saving the complications of a depth-first approach
coupled with marking sectors unallocated as an optimization we can add
later (perhaps even by adding a flag to the JSON command to choose
whether to use the optimization, since it requires r/w on all images in
the chain but allows faster restarts; or to skip the optimization, since
it allows for fewer r/w images but slower restarts).  That is, this
patch series invalidates intermediate images at the start of the commit
operation, whereas the proposed optimization would defer invalidating
images until they have been removed from the chain, but it doesn't
affect the correctness of this phase of the patch series.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 2/8] block: add live block commit functionality
  2012-09-14 18:23       ` Eric Blake
@ 2012-09-14 20:29         ` Jeff Cody
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Cody @ 2012-09-14 20:29 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

On 09/14/2012 02:23 PM, Eric Blake wrote:
> On 09/14/2012 10:07 AM, Jeff Cody wrote:
>>> Question: is it valid to have a qcow2 file whose size is smaller than
>>> it's backing image?
>>
>> I don't think so... however:
>>
>>>  Suppose I have base[1M] <- mid[2M] <- top[3M] <-
>>> active[3M], and request to commit top into base.  This bdrv_truncate()
>>> means I will now have:
>>>
>>> base[3M] <- mid[2M] <- top[3M] <- active[3M].
>>>
>>> If I then abort the commit operation at this point, then we have the
>>> situation of 'mid' reporting a smaller size than 'base' - which may make
>>> 'mid' invalid.  And even if it is valid, what happens if I now request
>>> to commit 'mid' into 'base', but 'base' already had data written past
>>> the 2M mark before I aborted the first operation?
>>
>> Once the commit starts, I don't know if you can safely abort it, and
>> still count on 'mid' being valid.  Ignoring potential size differences,
>> how would you ever know that what was written from 'top' into 'base' is
>> compatible with what is present in 'mid'?
> 
> We chatted about this some more on IRC, and I'll attempt to summarize
> the results of that conversation (correct me if I'm wrong)...
> 
> When committing across multiple images, there are four allocation cases
> to consider:
> 
> 1. unallocated in mid or top => nothing to do; base is already correct
> 
> 2. allocated in mid but not top => copy from mid to base; as long as mid
> is in the chain, both mid and top see the version in mid; as soon as mid
> is removed from the chain, top sees the version in base
> 
> 3. allocated in mid and in top => ultimately, we want to copy from top
> to base.  We can also do an intermediate copy from mid to base, although
> that is less efficient; as long as the copy from top to base happens
> last.  As long as the sector remains allocated, then mid always sees its
> own version, and top always sees its own version.
> 
> 4. allocated in top but not mid => we want to copy from top to base, but
> the moment we do that, if mid is still in the chain, then we have
> invalidated the contents of mid.  However, as long as top remains
> allocated, it sees its own version, and even if top is marked
> unallocated, it would then see through to base and see correct contents
> even though the intermediate file mid is inconsistent.

The above is true for an image chain where mid == top->backing_hd;
however, if there are additional images between mid and top, this is not
strictly true - you could have an allocation in some of the
intermediates, but not others, which leads to additional minor
complications.  See below for an illustration.

> 
> Use of block-commit has the potential to invalidate all images that are
> dropped from the chain (namely, any time allocation scenario 4 is
> present anywhere in the image); it is up to users to avoid using commit
> if they have any other image chain sharing the part of the chain
> discarded by this operation (someday, libvirt might track all storage
> chains, and be able to prevent an attempt at a commit if it would strand
> someone else's chain; but for now, we just document the issue).
> 
> Next, there is a question of whether invalidating the image up front is
> acceptable, or whether we must go through gyrations to avoid
> invalidation until after the image has been dropped from the chain.
> That is, does the invalidation happen the moment the commit starts (and
> can't be undone by an early abort), or can it be delayed until the point
> that the image is actually dropped from the chain.  As long as the
> current running qemu is the only entity using the portion of the chain
> being dropped, then the timing does not matter, other than affecting
> what optimizations we might be able to perform.
> 
> There is also a question of what happens if a commit is started, then
> aborted, then restarted.  It is always safe to restart the same commit
> from scratch, just not optimal, as the later run will spend time copying
> identical content that was already in base on the first run.  The only
> way to avoid copying sectors on a second run is to mark them unallocated
> on the first run, but then we have the issue of consistency: if a sector
> is allocated in both mid and top (scenario 3), and the first run copies
> top into base and then marks top unallocated, then a future read of top
> would pick up the contents from mid, which is wrong.  Therefore, we
> cannot mark sectors unallocated unless we traverse them in a safe order.
> 
> I was able to come up with an algorithm that allows for faster restarts
> of a commit operation, in order to avoid copying any sector into base
> more than once (at least, insofar as top is not also an active image,
> but we already deferred committing an active image for a later date).
> It requires that every image being trimmed from the chain be r/w
> (although only one image has to be r/w at a time), and that the copies
> be done in a depth-first manner.  That is, the algorithm first visits
> all allocted sectors in 'mid'; if they are not also allocated in top,
> then the sector is copied into base and marked unallocated in mid.  When
> mid is completed, it is removed from the chain, before proceeding to
> top.  Eventually, all sectors will be copied into base, exactly once,
> and the algorithm is restartable because it marks sectors unallocated
> once base has the correct contents.  But it is more complex to implement.
> 

Let's take the following example, but this time with 5 images instead of
4.  We are doing a commit with top == 'top', and base == 'base'.

0 marks no allocation, letters mark allocation by a specific layer
(the actual data is somewhat irrelevant; we assume if it is allocated to
that later it is unique to the layer)

Sample scenario:

        1 2 3 4  |  perceived data
-----------------------------------
 act |  e 0 0 0  |   (e b d d)
 top |  d 0 d d  |   (d b d d)
mid1 |  0 0 0 c  |   (0 b b c)
mid2 |  0 b b 0  |   (0 b b 0)
base |  0 0 a 0  |   (0 0 a 0)


If we look at allocations from mid1's perspective, all the way through
its chain, this is what we see (mid1->mid2->base):

mid1: (0 b b c)

Using the algorithm you mentioned above, sector 3 gives us grief. This
is what happens once we are done with the commit of mid2 into base, and
drop mid2 from the chain:

        1 2 3 4  |  perceived data
-----------------------------------
 act |  e 0 0 0  |   (e b d d)
 top |  d 0 d d  |   (d b d d)
mid1 |  0 0 0 c  |   (0 b a c)  <-- invalid as stand-alone image
base |  0 b a 0  |   (0 b a 0)

However, if mid1 still references mid2, then mid1->mid2->base remains ok
afterwards:

        1 2 3 4  |  perceived data
-----------------------------------
 act |  e 0 0 0  |   (e b d d)
 top |  d 0 d d  |   (d b d d)
mid1 |  0 0 0 c  |   (0 b b c)
mid2 |  0 0 b 0  |   (0 b b 0)
base |  0 b a 0  |   (0 b a 0)


So, in order to be safe, we can't modify an intermediate image file to
drop its backing file until it is invalidated anyway by the commit of
its own overlay(s) (although QEMU could drop it from its live chain in
RAM, and not modify the image files themselves to reflect the drop). And
since it will be rendered invalid, there is no point in modifying an
intermediate's backing file, ever - the two image files that it makes
sense to record the actual chain modification and intermediate drop is
top and top's overlay.

(This does bring to mind a change for this series - currently I only
modify top's overlay's backing file to be base; I should also modify
top's backing file to be base, because a different image chain that uses
top as a backing file would then remain valid).

Also, this optimization brings your original question back into play,
that is sidestepped by assuming all intermediates are invalid; namely,
if we have to grow the base via bdrv_truncate(), how we handle
intermediates that are smaller than base is relevant again.

> In conclusion, since this stage of the implementation never marks
> sectors unallocated, the use of the top of the chain is never
> invalidated even if intermediate files remain in the chain but have
> already been invalidated.  I'm okay with this patch going in as a first
> approximation, and saving the complications of a depth-first approach
> coupled with marking sectors unallocated as an optimization we can add
> later (perhaps even by adding a flag to the JSON command to choose
> whether to use the optimization, since it requires r/w on all images in
> the chain but allows faster restarts; or to skip the optimization, since
> it allows for fewer r/w images but slower restarts).  That is, this
> patch series invalidates intermediate images at the start of the commit
> operation, whereas the proposed optimization would defer invalidating
> images until they have been removed from the chain, but it doesn't
> affect the correctness of this phase of the patch series.
> 

I agree with this as well - we will be adding additional support with
active layer commit later, and can also add additional optimizations
like you suggested.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] QAPI: add command for live block commit, 'block-commit'
  2012-09-14 13:41 ` [Qemu-devel] [PATCH 6/8] QAPI: add command for live block commit, 'block-commit' Jeff Cody
@ 2012-09-15  1:05   ` Eric Blake
  2012-09-15  2:42     ` Jeff Cody
  0 siblings, 1 reply; 18+ messages in thread
From: Eric Blake @ 2012-09-15  1:05 UTC (permalink / raw)
  To: Jeff Cody; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

[-- Attachment #1: Type: text/plain, Size: 4306 bytes --]

On 09/14/2012 07:41 AM, Jeff Cody wrote:
> The command for live block commit is added, which has the following
> arguments:
> 
> device: the block device to perform the commit on (mandatory)
> base:   the base image to commit into; optional (if not specified,
>         it is the underlying original image)
> top:    the top image of the commit - all data from inside top down
>         to base will be committed into base. optional (if not specified,
>         it is the active image) - see note below
> speed:  maximum speed, in bytes/sec
> on_error: action to take on error (optional - default is report)

Shouldn't this be on-error, with a dash?  Also, this doesn't match the
actual commit, since you pulled it while waiting to rebase on Paolo's
patches.

> 
> note: eventually this will support merging down the active layer,
>       but that code is not yet complete.  If the active layer is passed
>       in currently as top, or top is left to the default, then the error
>       QERR_TOP_NOT_FOUND will be returned.

I don't think we need a new error category for this particular failure
(and in 4/8, you labeled the error as generic); so it is sufficient to
state that 'an error will be returned'.

> 
> The is done as a block job, so upon completion a BLOCK_JOB_COMPLETED will
> be emitted.
> 

> +++ b/blockdev.c
> @@ -825,7 +825,6 @@ exit:
>      return;
>  }
>  
> -
>  static void eject_device(BlockDriverState *bs, int force, Error **errp)

Spurious whitespace change.


> +void qmp_block_commit(const char *device,
> +                      bool has_base, const char *base,
> +                      bool has_top, const char *top,
> +                      bool has_speed, int64_t speed,
> +                      Error **errp)
> +{
> +    BlockDriverState *bs;
> +    BlockDriverState *base_bs, *top_bs;
> +    Error *local_err = NULL;
> +    /* This will be part of the QMP command, if/when the
> +     * BlockdevOnError change for blkmirror makes it in
> +     */
> +    BlockErrorAction on_error = BLOCK_ERR_REPORT;
> +
> +    /* drain all i/o before commits */
> +    bdrv_drain_all();

Is this technically necessary for now?  Since you are forbidding actions
on the active image for now, and changing the active image (via snapshot
or pull) already drains all I/O, there should be nothing remaining to
drain for any of the backing files.  Obviously, it will be necessary in
the future when you do add support for committing the active layer.

> +++ b/qapi-schema.json
> @@ -1404,6 +1404,41 @@
>    'returns': 'str' }
>  
>  ##
> +# @block-commit
> +#
> +# Live commit of data from child image nodes into parent nodes - i.e.,
> +# writes data between 'top' and 'base' into 'base'.
> +#
> +# @device:  the name of the device
> +#
> +# @base:   #optional The file name of the parent image of the device to write
> +#                    data into.  If not specified, this is the original parent
> +#                    image.

I though we wanted to fix the terminology to avoid 'parent'; how about:

The file name of the backing image to write data into.  If not
specified, this is the deepest backing image.

> +#
> +# @top:    #optional The file name of the child image, above which data will
> +#                    not be committed down.  If not specified, this is one
> +#                    layer below the active layer (i.e. active->backing_hd).

Again, how about:

The file name of the backing image within the chain which contains the
data to be committed down.  If not specified...

> +#
> +#                    If top == base, that is an error.
> +#
> +#
> +# @speed:  #optional the maximum speed, in bytes per second
> +#
> +# Returns: Nothing on success
> +#          If commit or stream is already active on this device, DeviceInUse
> +#          If @device does not exist, DeviceNotFound
> +#          If image commit is not supported by this device, NotSupported
> +#          If @base does not exist, BaseNotFound
> +#          If @top does not exist, TopNotFound

BaseNotFound is a generic error, and I'm arguing that TopNotFound should
be likewise.

-- 
Eric Blake   eblake@redhat.com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Qemu-devel] [PATCH 6/8] QAPI: add command for live block commit, 'block-commit'
  2012-09-15  1:05   ` Eric Blake
@ 2012-09-15  2:42     ` Jeff Cody
  0 siblings, 0 replies; 18+ messages in thread
From: Jeff Cody @ 2012-09-15  2:42 UTC (permalink / raw)
  To: Eric Blake; +Cc: kwolf, pbonzini, stefanha, qemu-devel, supriyak

On 09/14/2012 09:05 PM, Eric Blake wrote:
> On 09/14/2012 07:41 AM, Jeff Cody wrote:
>> The command for live block commit is added, which has the following
>> arguments:
>>
>> device: the block device to perform the commit on (mandatory)
>> base:   the base image to commit into; optional (if not specified,
>>         it is the underlying original image)
>> top:    the top image of the commit - all data from inside top down
>>         to base will be committed into base. optional (if not specified,
>>         it is the active image) - see note below
>> speed:  maximum speed, in bytes/sec
>> on_error: action to take on error (optional - default is report)
> 
> Shouldn't this be on-error, with a dash?  Also, this doesn't match the
> actual commit, since you pulled it while waiting to rebase on Paolo's
> patches.
> 

Yes, thanks, I need to update the commit message.

>>
>> note: eventually this will support merging down the active layer,
>>       but that code is not yet complete.  If the active layer is passed
>>       in currently as top, or top is left to the default, then the error
>>       QERR_TOP_NOT_FOUND will be returned.
> 
> I don't think we need a new error category for this particular failure
> (and in 4/8, you labeled the error as generic); so it is sufficient to
> state that 'an error will be returned'.
> 

Yes, QERR_TOP_NOT_FOUND will be a generic error, so I will just note that
an error will be returned, as you suggest.

>>
>> The is done as a block job, so upon completion a BLOCK_JOB_COMPLETED will
>> be emitted.
>>
> 
>> +++ b/blockdev.c
>> @@ -825,7 +825,6 @@ exit:
>>      return;
>>  }
>>  
>> -
>>  static void eject_device(BlockDriverState *bs, int force, Error **errp)
> 
> Spurious whitespace change.
> 

Thanks

> 
>> +void qmp_block_commit(const char *device,
>> +                      bool has_base, const char *base,
>> +                      bool has_top, const char *top,
>> +                      bool has_speed, int64_t speed,
>> +                      Error **errp)
>> +{
>> +    BlockDriverState *bs;
>> +    BlockDriverState *base_bs, *top_bs;
>> +    Error *local_err = NULL;
>> +    /* This will be part of the QMP command, if/when the
>> +     * BlockdevOnError change for blkmirror makes it in
>> +     */
>> +    BlockErrorAction on_error = BLOCK_ERR_REPORT;
>> +
>> +    /* drain all i/o before commits */
>> +    bdrv_drain_all();
> 
> Is this technically necessary for now?  Since you are forbidding actions
> on the active image for now, and changing the active image (via snapshot
> or pull) already drains all I/O, there should be nothing remaining to
> drain for any of the backing files.  Obviously, it will be necessary in
> the future when you do add support for committing the active layer.
> 

Technically, it is not needed for the current iteration, but since this
is the command handler that will be the same as when we add the active
layer, and I know we will eventually support it, I went ahead and added
it now. It shouldn't hurt anything.


>> +++ b/qapi-schema.json
>> @@ -1404,6 +1404,41 @@
>>    'returns': 'str' }
>>  
>>  ##
>> +# @block-commit
>> +#
>> +# Live commit of data from child image nodes into parent nodes - i.e.,
>> +# writes data between 'top' and 'base' into 'base'.
>> +#
>> +# @device:  the name of the device
>> +#
>> +# @base:   #optional The file name of the parent image of the device to write
>> +#                    data into.  If not specified, this is the original parent
>> +#                    image.
> 
> I though we wanted to fix the terminology to avoid 'parent'; how about:
> 
> The file name of the backing image to write data into.  If not
> specified, this is the deepest backing image.
> 

OK, sounds good.


>> +#
>> +# @top:    #optional The file name of the child image, above which data will
>> +#                    not be committed down.  If not specified, this is one
>> +#                    layer below the active layer (i.e. active->backing_hd).
> 
> Again, how about:
> 
> The file name of the backing image within the chain which contains the
> data to be committed down.  If not specified...
> 

Hmm, how about a slight tweak:

The file name of the backing image within the image chain, which
contains the topmost data to be committed down.  If not specified...

Since it doesn't necessarily have _the_ data to commit down, but
indicates the upper bounds of the data to commit down.

>> +#
>> +#                    If top == base, that is an error.
>> +#
>> +#
>> +# @speed:  #optional the maximum speed, in bytes per second
>> +#
>> +# Returns: Nothing on success
>> +#          If commit or stream is already active on this device, DeviceInUse
>> +#          If @device does not exist, DeviceNotFound
>> +#          If image commit is not supported by this device, NotSupported
>> +#          If @base does not exist, BaseNotFound
>> +#          If @top does not exist, TopNotFound
> 
> BaseNotFound is a generic error, and I'm arguing that TopNotFound should
> be likewise.
> 

Agree - both TopNotFound and BaseNotFound are of type ERROR_CLASS_GENERIC_ERROR

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-09-15  2:42 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-09-14 13:41 [Qemu-devel] [PATCH 0/8] Live block commit Jeff Cody
2012-09-14 13:41 ` [Qemu-devel] [PATCH 1/8] block: add support functions for live commit, to find and delete images Jeff Cody
2012-09-14 15:23   ` Eric Blake
2012-09-14 15:39     ` Jeff Cody
2012-09-14 13:41 ` [Qemu-devel] [PATCH 2/8] block: add live block commit functionality Jeff Cody
2012-09-14 15:45   ` Eric Blake
2012-09-14 16:07     ` Jeff Cody
2012-09-14 18:23       ` Eric Blake
2012-09-14 20:29         ` Jeff Cody
2012-09-14 13:41 ` [Qemu-devel] [PATCH 3/8] blockdev: rename block_stream_cb to a generic block_job_cb Jeff Cody
2012-09-14 13:41 ` [Qemu-devel] [PATCH 4/8] qerror: new error for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
2012-09-14 16:01   ` Eric Blake
2012-09-14 13:41 ` [Qemu-devel] [PATCH 5/8] block: helper function, to find the base image of a chain Jeff Cody
2012-09-14 13:41 ` [Qemu-devel] [PATCH 6/8] QAPI: add command for live block commit, 'block-commit' Jeff Cody
2012-09-15  1:05   ` Eric Blake
2012-09-15  2:42     ` Jeff Cody
2012-09-14 13:41 ` [Qemu-devel] [PATCH 7/8] qemu-iotests: add initial tests for live block commit Jeff Cody
2012-09-14 13:41 ` [Qemu-devel] [PATCH 8/8] block: after creating a live snapshot, make old image read-only Jeff Cody

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).