* [Qemu-devel] [RFC PATCH 0/4] Live block commit
@ 2012-07-31 5:16 Jeff Cody
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 1/4] block: add support functions for live commit, to find and delete images Jeff Cody
` (4 more replies)
0 siblings, 5 replies; 16+ messages in thread
From: Jeff Cody @ 2012-07-31 5:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, supriyak
These are proposed changes, to add live block commit functionality.
I originally had intended for this RFC series to include the more
complicated case of a live commit of the active layer, but removed
it for this commit in the hopes of making it into the soft feature
freeze for 1.2, so this series is the simpler case.
This series adds the basic case, of a live commit between two
images below the active layer, e.g.:
[base] <--- [snp-1] <--- [snp-2] <--- [snp-3] <--- [active]
can be collapsed down via commit, into:
[base] <--- [active]
or,
[base] <--- [snp-1] <--- [active],
[base] <--- [snp-3] <--- [active],
etc..
qemu-io tests for the block commit will be adding onto the v1 patch
set.
These patches are on top of Supriya's reopen() series, and Paolo's
block mirror series (the RFC series). I have not rebased yet to the
newer patches put out by Supriya and Paolo - this was tested w/o the
reopen, by manually making sure the images stayed in a r/w state.
Jeff Cody (4):
block: add support functions for live commit, to find and delete
images.
block: add live block commit functionality
qerror: new errors for live block commit, QERR_TOP_NOT_FOUND
QAPI: add command for live block commit, 'block-commit'
block.c | 136 ++++++++++++++++++++++++++++++++++-
block.h | 4 ++
block/Makefile.objs | 2 +-
block/commit.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++
block_int.h | 19 +++++
blockdev.c | 78 ++++++++++++++++++++
qapi-schema.json | 33 +++++++++
qerror.c | 4 ++
qerror.h | 3 +
qmp-commands.hx | 6 ++
trace-events | 2 +
11 files changed, 485 insertions(+), 2 deletions(-)
create mode 100644 block/commit.c
--
1.7.10.4
^ permalink raw reply [flat|nested] 16+ messages in thread
* [Qemu-devel] [RFC PATCH 1/4] block: add support functions for live commit, to find and delete images.
2012-07-31 5:16 [Qemu-devel] [RFC PATCH 0/4] Live block commit Jeff Cody
@ 2012-07-31 5:16 ` Jeff Cody
2012-07-31 17:34 ` Eric Blake
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 2/4] block: add live block commit functionality Jeff Cody
` (3 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2012-07-31 5:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, supriyak
Add bdrv_find_image(), bdrv_find_base(), and bdrv_delete_intermediate().
bdrv_find_image(): given a filename and a BDS, find the image in the chain
that matches the passed filename.
bdrv_find_base(): given a BDS, find the base image (parent-most image)
bdrv_delete_intermediate():
Given 3 BDS (active, top, base), delete images above
base up to and including top, and set base to be the
parent of top's child node.
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
block.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
block.h | 4 ++
2 files changed, 139 insertions(+), 1 deletion(-)
diff --git a/block.c b/block.c
index 522acd1..a3c8d33 100644
--- a/block.c
+++ b/block.c
@@ -1408,7 +1408,7 @@ int bdrv_commit(BlockDriverState *bs)
if (!drv)
return -ENOMEDIUM;
-
+
if (!bs->backing_hd) {
return -ENOTSUP;
}
@@ -1661,6 +1661,110 @@ int bdrv_change_backing_file(BlockDriverState *bs,
return ret;
}
+typedef struct BlkIntermediateStates {
+ BlockDriverState *bs;
+ QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
+} BlkIntermediateStates;
+
+
+/* deletes images above 'base' up to and including 'top', and sets the image
+ * above 'top' to have base as its backing file.
+ */
+int bdrv_delete_intermediate(BlockDriverState *active, BlockDriverState *top,
+ BlockDriverState *base)
+{
+ BlockDriverState *intermediate;
+ BlockDriverState *base_bs = NULL;
+ BlockDriverState *new_top_bs = NULL;
+ BlkIntermediateStates *intermediate_state, *next;
+ int ret = -1;
+
+ QSIMPLEQ_HEAD(states_to_delete, BlkIntermediateStates) states_to_delete;
+ QSIMPLEQ_INIT(&states_to_delete);
+
+ if (!top->drv || !base->drv) {
+ goto exit;
+ }
+
+ /* special case of top->backing_hd already pointing to base - nothing
+ * to do, no intermediate images
+ */
+ if (top->backing_hd == base) {
+ ret = 0;
+ goto exit;
+ }
+
+ /* if the active bs layer is the same as the new top, then there
+ * is no image above the top, so it is also the new_top (and we must
+ * not delete it below, either)
+ */
+ if (active == top) {
+ new_top_bs = active;
+ } else {
+ intermediate = active;
+ while (intermediate->backing_hd) {
+ if (intermediate->backing_hd == top) {
+ new_top_bs = intermediate;
+ break;
+ }
+ intermediate = intermediate->backing_hd;
+ }
+ }
+
+ if (new_top_bs == NULL) {
+ /* we could not find the image above 'top', this is an error */
+ goto exit;
+ }
+
+ /* if the active and top image passed in are the same, then we
+ * can't delete the active, so we start one below
+ */
+ intermediate = (active == top) ? active->backing_hd : top;
+
+ /* now we will go down through the list, and add each BDS we find
+ * into our deletion queue, until we hit the 'base'
+ */
+ while (intermediate) {
+ intermediate_state = g_malloc0(sizeof(BlkIntermediateStates));
+ intermediate_state->bs = intermediate;
+ QSIMPLEQ_INSERT_TAIL(&states_to_delete, intermediate_state, entry);
+
+ if (intermediate->backing_hd == base) {
+ base_bs = intermediate->backing_hd;
+ break;
+ }
+ intermediate = intermediate->backing_hd;
+ }
+ if (base_bs == NULL) {
+ /* something went wrong, we did not end at the base. safely
+ * unravel everything, and exit with error */
+ goto exit;
+ }
+
+ /* success - we can delete the intermediate states, and link top->base */
+ ret = bdrv_change_backing_file(new_top_bs, base_bs->filename,
+ base_bs->drv ? base_bs->drv->format_name : "");
+ if (ret) {
+ goto exit;
+ }
+ new_top_bs->backing_hd = base_bs;
+
+
+ QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
+ /* so that bdrv_close() does not recursively close the chain */
+ intermediate_state->bs->backing_hd = NULL;
+ bdrv_delete(intermediate_state->bs);
+ }
+ ret = 0;
+
+exit:
+ QSIMPLEQ_FOREACH_SAFE(intermediate_state, &states_to_delete, entry, next) {
+ g_free(intermediate_state);
+ }
+ return ret;
+}
+
+
static int bdrv_check_byte_request(BlockDriverState *bs, int64_t offset,
size_t size)
{
@@ -3128,6 +3232,36 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
return NULL;
}
+BlockDriverState *bdrv_find_image(BlockDriverState *bs,
+ const char *filename)
+{
+ if (!bs || !bs->drv) {
+ return NULL;
+ }
+
+ if (strcmp(bs->filename, filename) == 0) {
+ return bs;
+ } else {
+ return bdrv_find_image(bs->backing_hd, filename);
+ }
+}
+
+BlockDriverState *bdrv_find_base(BlockDriverState *bs)
+{
+ BlockDriverState *curr_bs = NULL;
+
+ if (!bs) {
+ return NULL;
+ }
+
+ curr_bs = bs;
+
+ while (curr_bs->backing_hd) {
+ curr_bs = curr_bs->backing_hd;
+ }
+ return curr_bs;
+}
+
#define NB_SUFFIXES 4
char *get_human_readable_size(char *buf, int buf_size, int64_t size)
diff --git a/block.h b/block.h
index 25e0230..70fa53c 100644
--- a/block.h
+++ b/block.h
@@ -183,6 +183,10 @@ int bdrv_change_backing_file(BlockDriverState *bs,
const char *backing_file, const char *backing_fmt);
void bdrv_register(BlockDriver *bdrv);
int bdrv_change_hostcache(BlockDriverState *bs, bool enable_host_cache);
+BlockDriverState *bdrv_find_image(BlockDriverState *bs, const char *filename);
+BlockDriverState *bdrv_find_base(BlockDriverState *bs);
+int bdrv_delete_intermediate(BlockDriverState *active, BlockDriverState *top,
+ BlockDriverState *base);
typedef struct BdrvCheckResult {
--
1.7.10.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [RFC PATCH 2/4] block: add live block commit functionality
2012-07-31 5:16 [Qemu-devel] [RFC PATCH 0/4] Live block commit Jeff Cody
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 1/4] block: add support functions for live commit, to find and delete images Jeff Cody
@ 2012-07-31 5:16 ` Jeff Cody
2012-07-31 17:51 ` Eric Blake
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 3/4] qerror: new errors for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
` (2 subsequent siblings)
4 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2012-07-31 5:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, supriyak
This adds the live commit coroutine. This iteration focuses on the
commit only below the active layer, and not the active layer itself.
The behaviour is similar to block streaming; the sectors are walked
through, and anything that exists above 'base' is committed back down
into base. At the end, intermediate images are deleted, and the
chain stiched together. Images are restored to their original open
flags.
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
block/Makefile.objs | 2 +-
block/commit.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++
block_int.h | 19 +++++
trace-events | 2 +
4 files changed, 222 insertions(+), 1 deletion(-)
create mode 100644 block/commit.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index f1a394a..9d07d3c 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -9,4 +9,4 @@ block-obj-$(CONFIG_LIBISCSI) += iscsi.o
block-obj-$(CONFIG_CURL) += curl.o
block-obj-$(CONFIG_RBD) += rbd.o
-common-obj-y += stream.o mirror.o
+common-obj-y += stream.o mirror.o commit.o
diff --git a/block/commit.c b/block/commit.c
new file mode 100644
index 0000000..6b29f79
--- /dev/null
+++ b/block/commit.c
@@ -0,0 +1,200 @@
+/*
+ * Live block commit
+ *
+ * Copyright Red Hat, Inc. 2012
+ *
+ * Authors:
+ * Jeff Cody <jcody@redhat.com>
+ * Based on stream.c by Stefan Hajnoczi
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "trace.h"
+#include "block_int.h"
+#include "blockjob.h"
+#include "qemu/ratelimit.h"
+
+enum {
+ /*
+ * Size of data buffer for populating the image file. This should be large
+ * enough to process multiple clusters in a single call, so that populating
+ * contiguous regions of the image is efficient.
+ */
+ COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
+};
+
+#define SLICE_TIME 100000000ULL /* ns */
+
+typedef struct CommitBlockJob {
+ BlockJob common;
+ RateLimit limit;
+ BlockDriverState *active;
+ BlockDriverState *top;
+ BlockDriverState *base;
+ BlockdevOnError on_error;
+ int base_flags;
+ int top_flags;
+} CommitBlockJob;
+
+static int coroutine_fn commit_populate(BlockDriverState *bs,
+ BlockDriverState *base,
+ int64_t sector_num, int nb_sectors,
+ void *buf)
+{
+ if (bdrv_read(bs, sector_num, buf, nb_sectors)) {
+ return -EIO;
+ }
+ if (bdrv_write(base, sector_num, buf, nb_sectors)) {
+ return -EIO;
+ }
+ return 0;
+}
+
+static void coroutine_fn commit_run(void *opaque)
+{
+ CommitBlockJob *s = opaque;
+ BlockDriverState *active = s->active;
+ BlockDriverState *top = s->top;
+ BlockDriverState *base = s->base;
+ int64_t sector_num, end;
+ int error = 0;
+ int ret = 0;
+ int n = 0;
+ void *buf;
+ int bytes_written = 0;
+
+ s->common.len = bdrv_getlength(top);
+ if (s->common.len < 0) {
+ block_job_completed(&s->common, s->common.len);
+ return;
+ }
+
+ end = s->common.len >> BDRV_SECTOR_BITS;
+ buf = qemu_blockalign(top, COMMIT_BUFFER_SIZE);
+
+ for (sector_num = 0; sector_num < end; sector_num += n) {
+ uint64_t delay_ns = 0;
+ bool copy;
+
+wait:
+ /* Note that even when no rate limit is applied we need to yield
+ * with no pending I/O here so that qemu_aio_flush() returns.
+ */
+ block_job_sleep_ns(&s->common, rt_clock, delay_ns);
+ if (block_job_is_cancelled(&s->common)) {
+ break;
+ }
+ /* Copy if allocated above the base */
+ ret = bdrv_co_is_allocated_above(top, base, sector_num,
+ COMMIT_BUFFER_SIZE / BDRV_SECTOR_SIZE,
+ &n);
+ copy = (ret == 1);
+ trace_commit_one_iteration(s, sector_num, n, ret);
+ if (ret >= 0 && copy) {
+ if (s->common.speed) {
+ delay_ns = ratelimit_calculate_delay(&s->limit, n);
+ if (delay_ns > 0) {
+ goto wait;
+ }
+ }
+ ret = commit_populate(top, base, sector_num, n, buf);
+ bytes_written += n * BDRV_SECTOR_SIZE;
+ }
+ if (ret < 0) {
+ BlockErrorAction action =
+ block_job_error_action(&s->common, s->common.bs, s->on_error,
+ true, -ret);
+ if (action == BDRV_ACTION_STOP) {
+ n = 0;
+ continue;
+ }
+ if (error == 0) {
+ error = ret;
+ }
+ if (action == BDRV_ACTION_REPORT) {
+ break;
+ }
+ }
+ ret = 0;
+
+ /* Publish progress */
+ s->common.offset += n * BDRV_SECTOR_SIZE;
+ }
+
+ if (!block_job_is_cancelled(&s->common) && sector_num == end && ret == 0) {
+ /* success */
+ if (bdrv_delete_intermediate(active, top, base)) {
+ /* something went wrong! */
+ /* TODO:add error reporting here */
+ }
+ }
+
+ /* restore base open flags here if appropriate (e.g., change the base back
+ * to r/o) */
+ if (s->base_flags != bdrv_get_flags(base)) {
+ bdrv_reopen(base, s->base_flags);
+ }
+ if (s->top_flags != bdrv_get_flags(top)) {
+ bdrv_reopen(top, s->top_flags);
+ }
+
+ qemu_vfree(buf);
+ block_job_completed(&s->common, ret);
+}
+
+static void commit_set_speed(BlockJob *job, int64_t speed, Error **errp)
+{
+ CommitBlockJob *s = container_of(job, CommitBlockJob, common);
+
+ if (speed < 0) {
+ error_set(errp, QERR_INVALID_PARAMETER, "speed");
+ return;
+ }
+ ratelimit_set_speed(&s->limit, speed / BDRV_SECTOR_SIZE, SLICE_TIME);
+}
+
+static BlockJobType commit_job_type = {
+ .instance_size = sizeof(CommitBlockJob),
+ .job_type = "commit",
+ .set_speed = commit_set_speed,
+};
+
+void commit_start(BlockDriverState *bs, BlockDriverState *base,
+ BlockDriverState *top, int64_t speed,
+ BlockdevOnError on_error, BlockDriverCompletionFunc *cb,
+ void *opaque, int orig_base_flags, int orig_top_flags,
+ Error **errp)
+{
+ CommitBlockJob *s;
+
+ if ((on_error == BLOCKDEV_ON_ERROR_STOP ||
+ on_error == BLOCKDEV_ON_ERROR_ENOSPC) &&
+ !bdrv_iostatus_is_enabled(bs)) {
+ error_set(errp, QERR_INVALID_PARAMETER_COMBINATION);
+ return;
+ }
+
+ s = block_job_create(&commit_job_type, bs, speed, cb, opaque, errp);
+ if (!s) {
+ return;
+ }
+
+ s->base = base;
+ s->top = top;
+ s->active = bs;
+
+ s->base_flags = orig_base_flags;
+ s->top_flags = orig_top_flags;
+
+ s->on_error = on_error;
+ s->common.co = qemu_coroutine_create(commit_run);
+
+ trace_commit_start(bs, base, top, s, s->common.co, opaque,
+ orig_base_flags, orig_top_flags);
+ qemu_coroutine_enter(s->common.co, s);
+
+ return;
+}
diff --git a/block_int.h b/block_int.h
index de52263..e3c2121 100644
--- a/block_int.h
+++ b/block_int.h
@@ -342,4 +342,23 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
BlockDriverCompletionFunc *cb,
void *opaque, Error **errp);
+/**
+ * commit_start:
+ * @bs: Top Block device
+ * @base: Block device that will be written into, and become the new top
+ * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
+ * @on_error: The action to take upon error.
+ * @cb: Completion function for the job.
+ * @opaque: Opaque pointer value passed to @cb.
+ * @orig_base_flags: The original open flags for the base image
+ * @orig_top_flags: The original open flags for the top image
+ * @errp: Error object.
+ *
+ */
+void commit_start(BlockDriverState *bs, BlockDriverState *base,
+ BlockDriverState *top, int64_t speed,
+ BlockdevOnError on_error, BlockDriverCompletionFunc *cb,
+ void *opaque, int orig_base_flags, int orig_top_flags,
+ Error **errp);
+
#endif /* BLOCK_INT_H */
diff --git a/trace-events b/trace-events
index d61bcba..3897de7 100644
--- a/trace-events
+++ b/trace-events
@@ -74,6 +74,8 @@ bdrv_co_do_copy_on_readv(void *bs, int64_t sector_num, int nb_sectors, int64_t c
# block/stream.c
stream_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
stream_start(void *bs, void *base, void *s, void *co, void *opaque) "bs %p base %p s %p co %p opaque %p"
+commit_one_iteration(void *s, int64_t sector_num, int nb_sectors, int is_allocated) "s %p sector_num %"PRId64" nb_sectors %d is_allocated %d"
+commit_start(void *bs, void *base, void *top, void *s, void *co, void *opaque, int base_flags, int top_flags) "bs %p base %p top %p s %p co %p opaque %p base_flags %d top_flags %d"
# block/mirror.c
mirror_start(void *bs, void *s, void *co, void *opaque) "bs %p s %p co %p opaque %p"
--
1.7.10.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [RFC PATCH 3/4] qerror: new errors for live block commit, QERR_TOP_NOT_FOUND
2012-07-31 5:16 [Qemu-devel] [RFC PATCH 0/4] Live block commit Jeff Cody
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 1/4] block: add support functions for live commit, to find and delete images Jeff Cody
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 2/4] block: add live block commit functionality Jeff Cody
@ 2012-07-31 5:16 ` Jeff Cody
2012-07-31 18:35 ` Eric Blake
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 4/4] QAPI: add command for live block commit, 'block-commit' Jeff Cody
2012-08-14 7:41 ` [Qemu-devel] [RFC PATCH 0/4] Live block commit Tiziano Müller
4 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2012-07-31 5:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, supriyak
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
qerror.c | 4 ++++
qerror.h | 3 +++
2 files changed, 7 insertions(+)
diff --git a/qerror.c b/qerror.c
index 25c2733..69a59ab 100644
--- a/qerror.c
+++ b/qerror.c
@@ -307,6 +307,10 @@ static const QErrorStringTable qerror_table[] = {
.desc = "Too many open files",
},
{
+ .error_fmt = QERR_TOP_NOT_FOUND,
+ .desc = "Top '%(top)' not found",
+ },
+ {
.error_fmt = QERR_UNDEFINED_ERROR,
.desc = "An undefined error has occurred",
},
diff --git a/qerror.h b/qerror.h
index 0f9f303..931c703 100644
--- a/qerror.h
+++ b/qerror.h
@@ -251,6 +251,9 @@ QError *qobject_to_qerror(const QObject *obj);
#define QERR_TOO_MANY_FILES \
"{ 'class': 'TooManyFiles', 'data': {} }"
+#define QERR_TOP_NOT_FOUND \
+ "{ 'class': 'TopNotFound', 'data': { 'top': %s } }"
+
#define QERR_UNDEFINED_ERROR \
"{ 'class': 'UndefinedError', 'data': {} }"
--
1.7.10.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [Qemu-devel] [RFC PATCH 4/4] QAPI: add command for live block commit, 'block-commit'
2012-07-31 5:16 [Qemu-devel] [RFC PATCH 0/4] Live block commit Jeff Cody
` (2 preceding siblings ...)
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 3/4] qerror: new errors for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
@ 2012-07-31 5:16 ` Jeff Cody
2012-07-31 18:38 ` Eric Blake
2012-08-14 7:41 ` [Qemu-devel] [RFC PATCH 0/4] Live block commit Tiziano Müller
4 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2012-07-31 5:16 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, pbonzini, stefanha, supriyak
The command for live block commit is added, which has the following
arguments:
device: the block device to perform the commit on (mandatory)
base: the base image to commit into; optional (if not specified,
it is the underlying original image)
top: the top image of the commit - all data from inside top down
to base will be committed into base. optional (if not specified,
it is the active image) - see note below
speed: maximum speed, in bytes/sec
on_error: action to take on error (optional - default is report)
note: eventually this will support merging down the active layer,
but that code is not yet complete. If the active layer is passed
in currently as top, or top is left to the default, then the error
QERR_TOP_NOT_FOUND will be returned.
The is down as a block job, so upon completion a BLOCK_JOB_COMPLETED will
be emitted.
Signed-off-by: Jeff Cody <jcody@redhat.com>
---
blockdev.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
qapi-schema.json | 33 +++++++++++++++++++++++
qmp-commands.hx | 6 +++++
3 files changed, 117 insertions(+)
diff --git a/blockdev.c b/blockdev.c
index f39d301..a61ee38 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -831,6 +831,84 @@ exit:
return;
}
+void qmp_block_commit(const char *device,
+ bool has_base, const char *base,
+ bool has_top, const char *top,
+ bool has_speed, int64_t speed,
+ bool has_on_error, BlockdevOnError on_error,
+ Error **errp)
+{
+ BlockDriverState *bs;
+ BlockDriverState *base_bs, *top_bs;
+ Error *local_err = NULL;
+ int orig_base_flags, orig_top_flags;
+
+ if (!has_on_error) {
+ on_error = BLOCKDEV_ON_ERROR_REPORT;
+ }
+
+ /* drain all i/o before commits */
+ bdrv_drain_all();
+
+
+ bs = bdrv_find(device);
+ if (!bs) {
+ error_set(errp, QERR_DEVICE_NOT_FOUND, device);
+ return;
+ }
+ if (base && has_base) {
+ base_bs = bdrv_find_backing_image(bs, base);
+ if (base_bs == NULL) {
+ error_set(errp, QERR_BASE_NOT_FOUND, base);
+ return;
+ }
+ } else {
+ base_bs = bdrv_find_base(bs);
+ if (base_bs == NULL) {
+ error_set(errp, QERR_BASE_NOT_FOUND, "NULL");
+ return;
+ }
+ }
+
+ if (top && has_top) {
+ /* if we want to allow the active layer,
+ * use 'bdrv_find_image()' here */
+ top_bs = bdrv_find_backing_image(bs, top);
+ if (top_bs == NULL) {
+ error_set(errp, QERR_TOP_NOT_FOUND, top);
+ return;
+ }
+ } else {
+ top_bs = bs;
+ }
+
+ orig_base_flags = bdrv_get_flags(base_bs);
+ /* convert base_bs to r/w, if necessary */
+ if (!(orig_base_flags & BDRV_O_RDWR)) {
+ bdrv_reopen(base_bs, orig_base_flags | BDRV_O_RDWR);
+ }
+
+ /* need to also make top r/w, so that when the commit is
+ * finished we can change the backing file */
+ orig_top_flags = bdrv_get_flags(top_bs);
+ /* convert top_bs to r/w, if necessary */
+ if (!(orig_top_flags & BDRV_O_RDWR)) {
+ bdrv_reopen(top_bs, orig_top_flags | BDRV_O_RDWR);
+ }
+
+ commit_start(bs, base_bs, top_bs, speed, on_error,
+ block_job_cb, bs, orig_base_flags, orig_top_flags,
+ &local_err);
+ if (local_err != NULL) {
+ error_propagate(errp, local_err);
+ return;
+ }
+ /* Grab a reference so hotplug does not delete the BlockDriverState from
+ * underneath us.
+ */
+ drive_get_ref(drive_get_by_blockdev(bs));
+}
+
void qmp_drive_mirror(const char *device, const char *target,
bool has_format, const char *format,
enum MirrorSyncMode sync,
diff --git a/qapi-schema.json b/qapi-schema.json
index db6903f..9ab6f90 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1365,6 +1365,39 @@
'returns': 'str' }
##
+# @block-commit
+#
+# Live commit of data from child image nodes into parent nodes - i.e.,
+# writes data between 'top' and 'base' into 'base'.
+#
+# @device: the name of the device
+#
+# @base: #optional The parent image of the device to write data into.
+# If not specified, this is the original parent image.
+#
+# @top: #optional The child image, above which data will not be commited
+# down. If not specified, this is the active layer.
+#
+# @speed: #optional the maximum speed, in bytes per second
+#
+# @on_error: #optional the action to take on an error (default report)
+#
+# Returns: Nothing on success
+# If commit or stream is already active on this device, DeviceInUse
+# If @device does not exist, DeviceNotFound
+# If image commit is not supported by this device, NotSupported
+# If @base does not exist, BaseNotFound
+# If @top does not exist, TopNotFound
+# If @speed is invalid, InvalidParameter
+#
+# Since: 1.2
+#
+##
+{ 'command': 'block-commit',
+ 'data': { 'device': 'str', '*base': 'str', '*top': 'str',
+ '*speed': 'int', '*on_error': 'BlockdevOnError' } }
+
+##
# @drive-mirror
#
# Start mirroring a block device's writes to a new destination.
diff --git a/qmp-commands.hx b/qmp-commands.hx
index bde96cf..2f21142 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -723,6 +723,12 @@ EQMP
},
{
+ .name = "block-commit",
+ .args_type = "device:B,base:s?,top:s?,speed:o?,on_error:s?",
+ .mhandler.cmd_new = qmp_marshal_input_block_commit,
+ },
+
+ {
.name = "block-job-set-speed",
.args_type = "device:B,speed:o",
.mhandler.cmd_new = qmp_marshal_input_block_job_set_speed,
--
1.7.10.4
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/4] block: add support functions for live commit, to find and delete images.
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 1/4] block: add support functions for live commit, to find and delete images Jeff Cody
@ 2012-07-31 17:34 ` Eric Blake
2012-07-31 17:52 ` Jeff Cody
0 siblings, 1 reply; 16+ messages in thread
From: Eric Blake @ 2012-07-31 17:34 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, pbonzini, qemu-devel, supriyak, stefanha
[-- Attachment #1: Type: text/plain, Size: 3352 bytes --]
On 07/30/2012 11:16 PM, Jeff Cody wrote:
> Add bdrv_find_image(), bdrv_find_base(), and bdrv_delete_intermediate().
>
> bdrv_find_image(): given a filename and a BDS, find the image in the chain
> that matches the passed filename.
>
> bdrv_find_base(): given a BDS, find the base image (parent-most image)
>
> bdrv_delete_intermediate():
> Given 3 BDS (active, top, base), delete images above
> base up to and including top, and set base to be the
> parent of top's child node.
A diagram, as was in your cover letter, would help (either here, or
better yet in the comments describing this function in place)...
>
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
> block.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
> block.h | 4 ++
> 2 files changed, 139 insertions(+), 1 deletion(-)
>
> diff --git a/block.c b/block.c
> index 522acd1..a3c8d33 100644
> --- a/block.c
> +++ b/block.c
> @@ -1408,7 +1408,7 @@ int bdrv_commit(BlockDriverState *bs)
>
> if (!drv)
> return -ENOMEDIUM;
> -
> +
> if (!bs->backing_hd) {
> return -ENOTSUP;
> }
Spurious whitespace cleanup, since nothing else in this hunk belongs to
you. Is that trailing whitespace present upstream, or was it introduced
in one of the patches you based off of?
> @@ -1661,6 +1661,110 @@ int bdrv_change_backing_file(BlockDriverState *bs,
> return ret;
> }
>
> +typedef struct BlkIntermediateStates {
> + BlockDriverState *bs;
> + QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
> +} BlkIntermediateStates;
> +
> +
> +/* deletes images above 'base' up to and including 'top', and sets the image
> + * above 'top' to have base as its backing file.
> + */
> +int bdrv_delete_intermediate(BlockDriverState *active, BlockDriverState *top,
> + BlockDriverState *base)
...that is, I think this would aid the reader:
Converts:
bottom <- base <- intermediate <- top <- active
to
bottom <- base <- active
where top==active is permitted
> @@ -3128,6 +3232,36 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
> return NULL;
> }
>
> +BlockDriverState *bdrv_find_image(BlockDriverState *bs,
> + const char *filename)
> +{
> + if (!bs || !bs->drv) {
> + return NULL;
> + }
> +
> + if (strcmp(bs->filename, filename) == 0) {
> + return bs;
> + } else {
> + return bdrv_find_image(bs->backing_hd, filename);
Tail-recursive; are we worried enough about ever hitting stack overflow
due to a really deep change to convert this into a while loop recursion
instead? [Probably not]
> + }
> +}
> +
> +BlockDriverState *bdrv_find_base(BlockDriverState *bs)
> +{
> + BlockDriverState *curr_bs = NULL;
> +
> + if (!bs) {
> + return NULL;
> + }
> +
> + curr_bs = bs;
> +
> + while (curr_bs->backing_hd) {
> + curr_bs = curr_bs->backing_hd;
> + }
Then again, here you did while-loop recursion, so using the same style
in both functions might be worthwhile.
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 2/4] block: add live block commit functionality
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 2/4] block: add live block commit functionality Jeff Cody
@ 2012-07-31 17:51 ` Eric Blake
2012-07-31 17:55 ` Jeff Cody
0 siblings, 1 reply; 16+ messages in thread
From: Eric Blake @ 2012-07-31 17:51 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, pbonzini, qemu-devel, supriyak, stefanha
[-- Attachment #1: Type: text/plain, Size: 2244 bytes --]
On 07/30/2012 11:16 PM, Jeff Cody wrote:
> This adds the live commit coroutine. This iteration focuses on the
> commit only below the active layer, and not the active layer itself.
>
> The behaviour is similar to block streaming; the sectors are walked
> through, and anything that exists above 'base' is committed back down
> into base. At the end, intermediate images are deleted, and the
> chain stiched together. Images are restored to their original open
s/stiched/stitched/
> +
> +enum {
> + /*
> + * Size of data buffer for populating the image file. This should be large
> + * enough to process multiple clusters in a single call, so that populating
> + * contiguous regions of the image is efficient.
> + */
> + COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
> +};
Paolo's latest round of patches got to the point of making this
configurable for drive-mirror; is that something you should be copying here?
> +++ b/block_int.h
> @@ -342,4 +342,23 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
> BlockDriverCompletionFunc *cb,
> void *opaque, Error **errp);
>
> +/**
> + * commit_start:
> + * @bs: Top Block device
> + * @base: Block device that will be written into, and become the new top
> + * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
> + * @on_error: The action to take upon error.
Do we need to distinguish between rerror and werror?
> + * @cb: Completion function for the job.
> + * @opaque: Opaque pointer value passed to @cb.
> + * @orig_base_flags: The original open flags for the base image
> + * @orig_top_flags: The original open flags for the top image
> + * @errp: Error object.
> + *
> + */
> +void commit_start(BlockDriverState *bs, BlockDriverState *base,
> + BlockDriverState *top, int64_t speed,
> + BlockdevOnError on_error, BlockDriverCompletionFunc *cb,
> + void *opaque, int orig_base_flags, int orig_top_flags,
> + Error **errp);
> +
> #endif /* BLOCK_INT_H */
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 1/4] block: add support functions for live commit, to find and delete images.
2012-07-31 17:34 ` Eric Blake
@ 2012-07-31 17:52 ` Jeff Cody
0 siblings, 0 replies; 16+ messages in thread
From: Jeff Cody @ 2012-07-31 17:52 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, pbonzini, qemu-devel, supriyak, stefanha
On 07/31/2012 01:34 PM, Eric Blake wrote:
> On 07/30/2012 11:16 PM, Jeff Cody wrote:
>> Add bdrv_find_image(), bdrv_find_base(), and bdrv_delete_intermediate().
>>
>> bdrv_find_image(): given a filename and a BDS, find the image in the chain
>> that matches the passed filename.
>>
>> bdrv_find_base(): given a BDS, find the base image (parent-most image)
>>
>> bdrv_delete_intermediate():
>> Given 3 BDS (active, top, base), delete images above
>> base up to and including top, and set base to be the
>> parent of top's child node.
>
> A diagram, as was in your cover letter, would help (either here, or
> better yet in the comments describing this function in place)...
>
Or even better, both :)
I'll add that in for v1.
>>
>> Signed-off-by: Jeff Cody <jcody@redhat.com>
>> ---
>> block.c | 136 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++-
>> block.h | 4 ++
>> 2 files changed, 139 insertions(+), 1 deletion(-)
>>
>> diff --git a/block.c b/block.c
>> index 522acd1..a3c8d33 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -1408,7 +1408,7 @@ int bdrv_commit(BlockDriverState *bs)
>>
>> if (!drv)
>> return -ENOMEDIUM;
>> -
>> +
>> if (!bs->backing_hd) {
>> return -ENOTSUP;
>> }
>
> Spurious whitespace cleanup, since nothing else in this hunk belongs to
> you. Is that trailing whitespace present upstream, or was it introduced
> in one of the patches you based off of?
Likely a spurious cleanup - I had several trailing whitespaces in my
block.c changes, and scripts/checkpatch.pl warned me of those. I
cleaned them up, and I must have cleaned up an extra one with my regex.
>
>> @@ -1661,6 +1661,110 @@ int bdrv_change_backing_file(BlockDriverState *bs,
>> return ret;
>> }
>>
>> +typedef struct BlkIntermediateStates {
>> + BlockDriverState *bs;
>> + QSIMPLEQ_ENTRY(BlkIntermediateStates) entry;
>> +} BlkIntermediateStates;
>> +
>> +
>> +/* deletes images above 'base' up to and including 'top', and sets the image
>> + * above 'top' to have base as its backing file.
>> + */
>> +int bdrv_delete_intermediate(BlockDriverState *active, BlockDriverState *top,
>> + BlockDriverState *base)
>
> ...that is, I think this would aid the reader:
>
> Converts:
>
> bottom <- base <- intermediate <- top <- active
>
> to
>
> bottom <- base <- active
>
> where top==active is permitted
I agree that is better. And, for clarity, bottom==base is permitted as
well.
>
>> @@ -3128,6 +3232,36 @@ BlockDriverState *bdrv_find_backing_image(BlockDriverState *bs,
>> return NULL;
>> }
>>
>> +BlockDriverState *bdrv_find_image(BlockDriverState *bs,
>> + const char *filename)
>> +{
>> + if (!bs || !bs->drv) {
>> + return NULL;
>> + }
>> +
>> + if (strcmp(bs->filename, filename) == 0) {
>> + return bs;
>> + } else {
>> + return bdrv_find_image(bs->backing_hd, filename);
>
> Tail-recursive; are we worried enough about ever hitting stack overflow
> due to a really deep change to convert this into a while loop recursion
> instead? [Probably not]
Not too worried about it, because the chain should not be *that* long,
and also the block layer handles the chain in a similar fashion other
places, so we'll likely blow up in those places first :)
That said, when doing some automated live snapshot testing with an
obscene number of snapshots, I did manage to blow the stack (IIRC) in
the recursive open. That was with something like a chain of 1500
snapshots, which seems a bit excessive.
But, I agree with your point below:
>
>> + }
>> +}
>> +
>> +BlockDriverState *bdrv_find_base(BlockDriverState *bs)
>> +{
>> + BlockDriverState *curr_bs = NULL;
>> +
>> + if (!bs) {
>> + return NULL;
>> + }
>> +
>> + curr_bs = bs;
>> +
>> + while (curr_bs->backing_hd) {
>> + curr_bs = curr_bs->backing_hd;
>> + }
>
> Then again, here you did while-loop recursion, so using the same style
> in both functions might be worthwhile.
>
Yes - maybe that is a good reason to have bdrv_find_image() be a
while-loop (I based it off of bdrv_find_backing_image(), which
is why it was recursive). In general, I find recursive functions make
my brain hurt, so I tend to like while-loops better :)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 2/4] block: add live block commit functionality
2012-07-31 17:51 ` Eric Blake
@ 2012-07-31 17:55 ` Jeff Cody
2012-08-01 6:32 ` Kevin Wolf
0 siblings, 1 reply; 16+ messages in thread
From: Jeff Cody @ 2012-07-31 17:55 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, pbonzini, qemu-devel, supriyak, stefanha
On 07/31/2012 01:51 PM, Eric Blake wrote:
> On 07/30/2012 11:16 PM, Jeff Cody wrote:
>> This adds the live commit coroutine. This iteration focuses on the
>> commit only below the active layer, and not the active layer itself.
>>
>> The behaviour is similar to block streaming; the sectors are walked
>> through, and anything that exists above 'base' is committed back down
>> into base. At the end, intermediate images are deleted, and the
>> chain stiched together. Images are restored to their original open
>
> s/stiched/stitched/
>
>> +
>> +enum {
>> + /*
>> + * Size of data buffer for populating the image file. This should be large
>> + * enough to process multiple clusters in a single call, so that populating
>> + * contiguous regions of the image is efficient.
>> + */
>> + COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
>> +};
>
> Paolo's latest round of patches got to the point of making this
> configurable for drive-mirror; is that something you should be copying here?
Yes
>
>> +++ b/block_int.h
>> @@ -342,4 +342,23 @@ void mirror_start(BlockDriverState *bs, BlockDriverState *target,
>> BlockDriverCompletionFunc *cb,
>> void *opaque, Error **errp);
>>
>> +/**
>> + * commit_start:
>> + * @bs: Top Block device
>> + * @base: Block device that will be written into, and become the new top
>> + * @speed: The maximum speed, in bytes per second, or 0 for unlimited.
>> + * @on_error: The action to take upon error.
>
> Do we need to distinguish between rerror and werror?
Good question - I don't think that would change the error handling, but
it may be useful information to get the user. If you prefer rerror and
werror distinction, I'll add that in.
>
>> + * @cb: Completion function for the job.
>> + * @opaque: Opaque pointer value passed to @cb.
>> + * @orig_base_flags: The original open flags for the base image
>> + * @orig_top_flags: The original open flags for the top image
>> + * @errp: Error object.
>> + *
>> + */
>> +void commit_start(BlockDriverState *bs, BlockDriverState *base,
>> + BlockDriverState *top, int64_t speed,
>> + BlockdevOnError on_error, BlockDriverCompletionFunc *cb,
>> + void *opaque, int orig_base_flags, int orig_top_flags,
>> + Error **errp);
>> +
>> #endif /* BLOCK_INT_H */
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 3/4] qerror: new errors for live block commit, QERR_TOP_NOT_FOUND
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 3/4] qerror: new errors for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
@ 2012-07-31 18:35 ` Eric Blake
0 siblings, 0 replies; 16+ messages in thread
From: Eric Blake @ 2012-07-31 18:35 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, pbonzini, qemu-devel, supriyak, stefanha
[-- Attachment #1: Type: text/plain, Size: 808 bytes --]
On 07/30/2012 11:16 PM, Jeff Cody wrote:
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
> qerror.c | 4 ++++
> qerror.h | 3 +++
> 2 files changed, 7 insertions(+)
>
> diff --git a/qerror.c b/qerror.c
> index 25c2733..69a59ab 100644
> --- a/qerror.c
> +++ b/qerror.c
> @@ -307,6 +307,10 @@ static const QErrorStringTable qerror_table[] = {
> .desc = "Too many open files",
> },
> {
> + .error_fmt = QERR_TOP_NOT_FOUND,
> + .desc = "Top '%(top)' not found",
> + },
> + {
This conflicts with the series to simplify error handling; if we want
both series, I'd suggest rebasing this one to use the simpler handling.
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 4/4] QAPI: add command for live block commit, 'block-commit'
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 4/4] QAPI: add command for live block commit, 'block-commit' Jeff Cody
@ 2012-07-31 18:38 ` Eric Blake
0 siblings, 0 replies; 16+ messages in thread
From: Eric Blake @ 2012-07-31 18:38 UTC (permalink / raw)
To: Jeff Cody; +Cc: kwolf, pbonzini, qemu-devel, supriyak, stefanha
[-- Attachment #1: Type: text/plain, Size: 2664 bytes --]
On 07/30/2012 11:16 PM, Jeff Cody wrote:
> The command for live block commit is added, which has the following
> arguments:
>
> device: the block device to perform the commit on (mandatory)
> base: the base image to commit into; optional (if not specified,
> it is the underlying original image)
> top: the top image of the commit - all data from inside top down
> to base will be committed into base. optional (if not specified,
> it is the active image) - see note below
> speed: maximum speed, in bytes/sec
> on_error: action to take on error (optional - default is report)
>
> note: eventually this will support merging down the active layer,
> but that code is not yet complete. If the active layer is passed
> in currently as top, or top is left to the default, then the error
> QERR_TOP_NOT_FOUND will be returned.
>
> The is down as a block job, so upon completion a BLOCK_JOB_COMPLETED will
> be emitted.
s/The is down/This is done/ ?
Also, how does block-job-cancel interact with the job started by this
command? Is it something we can cancel and restart at will, like
block-stream?
> +++ b/qapi-schema.json
> @@ -1365,6 +1365,39 @@
> 'returns': 'str' }
>
> ##
> +# @block-commit
> +#
> +# Live commit of data from child image nodes into parent nodes - i.e.,
> +# writes data between 'top' and 'base' into 'base'.
> +#
> +# @device: the name of the device
> +#
> +# @base: #optional The parent image of the device to write data into.
> +# If not specified, this is the original parent image.
> +#
> +# @top: #optional The child image, above which data will not be commited
s/commited/committed/
> +# @on_error: #optional the action to take on an error (default report)
> +#
> +# Returns: Nothing on success
> +# If commit or stream is already active on this device, DeviceInUse
> +# If @device does not exist, DeviceNotFound
> +# If image commit is not supported by this device, NotSupported
> +# If @base does not exist, BaseNotFound
> +# If @top does not exist, TopNotFound
> +# If @speed is invalid, InvalidParameter
> +#
> +# Since: 1.2
> +#
> +##
> +{ 'command': 'block-commit',
> + 'data': { 'device': 'str', '*base': 'str', '*top': 'str',
> + '*speed': 'int', '*on_error': 'BlockdevOnError' } }
s/on_error/on-error/ (hmm, I guess you were copying from Paolo's series,
so he probably has the same change to make)
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 620 bytes --]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 2/4] block: add live block commit functionality
2012-07-31 17:55 ` Jeff Cody
@ 2012-08-01 6:32 ` Kevin Wolf
2012-08-01 7:07 ` Paolo Bonzini
0 siblings, 1 reply; 16+ messages in thread
From: Kevin Wolf @ 2012-08-01 6:32 UTC (permalink / raw)
To: jcody; +Cc: supriyak, pbonzini, Eric Blake, qemu-devel, stefanha
Am 31.07.2012 19:55, schrieb Jeff Cody:
> On 07/31/2012 01:51 PM, Eric Blake wrote:
>> On 07/30/2012 11:16 PM, Jeff Cody wrote:
>>> This adds the live commit coroutine. This iteration focuses on the
>>> commit only below the active layer, and not the active layer itself.
>>>
>>> The behaviour is similar to block streaming; the sectors are walked
>>> through, and anything that exists above 'base' is committed back down
>>> into base. At the end, intermediate images are deleted, and the
>>> chain stiched together. Images are restored to their original open
>>
>> s/stiched/stitched/
>>
>>> +
>>> +enum {
>>> + /*
>>> + * Size of data buffer for populating the image file. This should be large
>>> + * enough to process multiple clusters in a single call, so that populating
>>> + * contiguous regions of the image is efficient.
>>> + */
>>> + COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
>>> +};
>>
>> Paolo's latest round of patches got to the point of making this
>> configurable for drive-mirror; is that something you should be copying here?
>
> Yes
Though its use is very limited for live commit. For the mirror it's
important because a larger number can mean that more data is
unnecessarily written, and the target can become larger than the source.
For live commit, I think using a larger buffer is always better.
Hm, part of the difference is that I assume that commit uses
bdrv_is_allocated() to check whether some data must really be copied.
But then, there's no reason why mirroring couldn't do that as well. Paolo?
Kevin
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 2/4] block: add live block commit functionality
2012-08-01 6:32 ` Kevin Wolf
@ 2012-08-01 7:07 ` Paolo Bonzini
2012-08-01 11:23 ` Jeff Cody
0 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2012-08-01 7:07 UTC (permalink / raw)
To: Kevin Wolf; +Cc: stefanha, jcody, Eric Blake, qemu-devel, supriyak
Il 01/08/2012 08:32, Kevin Wolf ha scritto:
>>>> >>> +enum {
>>>> >>> + /*
>>>> >>> + * Size of data buffer for populating the image file. This should be large
>>>> >>> + * enough to process multiple clusters in a single call, so that populating
>>>> >>> + * contiguous regions of the image is efficient.
>>>> >>> + */
>>>> >>> + COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
>>>> >>> +};
>>> >>
>>> >> Paolo's latest round of patches got to the point of making this
>>> >> configurable for drive-mirror; is that something you should be copying here?
>> >
>> > Yes
> Though its use is very limited for live commit. For the mirror it's
> important because a larger number can mean that more data is
> unnecessarily written, and the target can become larger than the source.
Note that the latest version of mirroring has _two_ knobs:
1) granularity is what decides how much data could be written
unnecessarily, because of the dirty bitmap.
2) buffer size is what decides how much I/O is in flight at one time.
The default values are resp. the cluster size (or 64K for raw) and 10M.
The two together give mirroring some self-tuning capability. For
example in the first part of the mirroring you will likely proceed in
10M chunks with no concurrency; once you're synchronized, you'll
probably send several chunks, perhaps all 64K if the guest does random
writes.
Live commit as it is done now doesn't need any of this complication; it
is just a background operation that does not need to compete with the
guest. So using a larger buffer is indeed always better, and 512K is a
nice intermediate value between mirroring's 64K and 10M extremes.
> For live commit, I think using a larger buffer is always better.
>
> Hm, part of the difference is that I assume that commit uses
> bdrv_is_allocated() to check whether some data must really be copied.
> But then, there's no reason why mirroring couldn't do that as well. Paolo?
We copy a cluster at a time, and that's also the resolution of
bdrv_is_allocated so we wouldn't gain anything. Nice idea though, I had
to mull about it to find the flaw. :)
Paolo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 2/4] block: add live block commit functionality
2012-08-01 7:07 ` Paolo Bonzini
@ 2012-08-01 11:23 ` Jeff Cody
0 siblings, 0 replies; 16+ messages in thread
From: Jeff Cody @ 2012-08-01 11:23 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Kevin Wolf, stefanha, Eric Blake, qemu-devel, supriyak
On 08/01/2012 03:07 AM, Paolo Bonzini wrote:
> Il 01/08/2012 08:32, Kevin Wolf ha scritto:
>>>>>>>> +enum {
>>>>>>>> + /*
>>>>>>>> + * Size of data buffer for populating the image file. This should be large
>>>>>>>> + * enough to process multiple clusters in a single call, so that populating
>>>>>>>> + * contiguous regions of the image is efficient.
>>>>>>>> + */
>>>>>>>> + COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
>>>>>>>> +};
>>>>>>
>>>>>> Paolo's latest round of patches got to the point of making this
>>>>>> configurable for drive-mirror; is that something you should be copying here?
>>>>
>>>> Yes
>> Though its use is very limited for live commit. For the mirror it's
>> important because a larger number can mean that more data is
>> unnecessarily written, and the target can become larger than the source.
>
> Note that the latest version of mirroring has _two_ knobs:
>
> 1) granularity is what decides how much data could be written
> unnecessarily, because of the dirty bitmap.
>
> 2) buffer size is what decides how much I/O is in flight at one time.
>
> The default values are resp. the cluster size (or 64K for raw) and 10M.
> The two together give mirroring some self-tuning capability. For
> example in the first part of the mirroring you will likely proceed in
> 10M chunks with no concurrency; once you're synchronized, you'll
> probably send several chunks, perhaps all 64K if the guest does random
> writes.
>
> Live commit as it is done now doesn't need any of this complication; it
> is just a background operation that does not need to compete with the
> guest. So using a larger buffer is indeed always better, and 512K is a
> nice intermediate value between mirroring's 64K and 10M extremes.
>
We may want these same adjust knobs for the 'second phase' of live
commit, however. It will commit down the top (active) layer, which will
have similar properties to mirroring, since the guest can still be
writing to the active layer. If we do need it for the second phase,
perhaps those controls will be useful now.
>> For live commit, I think using a larger buffer is always better.
>>
>> Hm, part of the difference is that I assume that commit uses
>> bdrv_is_allocated() to check whether some data must really be copied.
>> But then, there's no reason why mirroring couldn't do that as well. Paolo?
>
> We copy a cluster at a time, and that's also the resolution of
> bdrv_is_allocated so we wouldn't gain anything. Nice idea though, I had
> to mull about it to find the flaw. :)
>
> Paolo
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 0/4] Live block commit
2012-07-31 5:16 [Qemu-devel] [RFC PATCH 0/4] Live block commit Jeff Cody
` (3 preceding siblings ...)
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 4/4] QAPI: add command for live block commit, 'block-commit' Jeff Cody
@ 2012-08-14 7:41 ` Tiziano Müller
2012-08-29 13:40 ` Jeff Cody
4 siblings, 1 reply; 16+ messages in thread
From: Tiziano Müller @ 2012-08-14 7:41 UTC (permalink / raw)
To: qemu-devel
Hi Jeff
This is an awesome feature and absolutely necessary to get a working
live-snapshot/backup solution.
What is the status on this? Will it make it into 1.2.0?
Thanks in advance,
best regards,
Tiziano
Am Dienstag, den 31.07.2012, 01:16 -0400 schrieb Jeff Cody:
> These are proposed changes, to add live block commit functionality.
>
> I originally had intended for this RFC series to include the more
> complicated case of a live commit of the active layer, but removed
> it for this commit in the hopes of making it into the soft feature
> freeze for 1.2, so this series is the simpler case.
>
> This series adds the basic case, of a live commit between two
> images below the active layer, e.g.:
>
> [base] <--- [snp-1] <--- [snp-2] <--- [snp-3] <--- [active]
>
> can be collapsed down via commit, into:
>
> [base] <--- [active]
>
> or,
>
> [base] <--- [snp-1] <--- [active],
>
> [base] <--- [snp-3] <--- [active],
>
> etc..
>
>
> qemu-io tests for the block commit will be adding onto the v1 patch
> set.
>
> These patches are on top of Supriya's reopen() series, and Paolo's
> block mirror series (the RFC series). I have not rebased yet to the
> newer patches put out by Supriya and Paolo - this was tested w/o the
> reopen, by manually making sure the images stayed in a r/w state.
>
>
> Jeff Cody (4):
> block: add support functions for live commit, to find and delete
> images.
> block: add live block commit functionality
> qerror: new errors for live block commit, QERR_TOP_NOT_FOUND
> QAPI: add command for live block commit, 'block-commit'
>
> block.c | 136 ++++++++++++++++++++++++++++++++++-
> block.h | 4 ++
> block/Makefile.objs | 2 +-
> block/commit.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++
> block_int.h | 19 +++++
> blockdev.c | 78 ++++++++++++++++++++
> qapi-schema.json | 33 +++++++++
> qerror.c | 4 ++
> qerror.h | 3 +
> qmp-commands.hx | 6 ++
> trace-events | 2 +
> 11 files changed, 485 insertions(+), 2 deletions(-)
> create mode 100644 block/commit.c
>
--
stepping stone GmbH
Neufeldstrasse 9
CH-3012 Bern
Telefon: +41 31 332 53 63
www.stepping-stone.ch
tiziano.mueller@stepping-stone.ch
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [Qemu-devel] [RFC PATCH 0/4] Live block commit
2012-08-14 7:41 ` [Qemu-devel] [RFC PATCH 0/4] Live block commit Tiziano Müller
@ 2012-08-29 13:40 ` Jeff Cody
0 siblings, 0 replies; 16+ messages in thread
From: Jeff Cody @ 2012-08-29 13:40 UTC (permalink / raw)
To: tiziano.mueller; +Cc: qemu-devel
On 08/14/2012 03:41 AM, Tiziano Müller wrote:
> Hi Jeff
>
> This is an awesome feature and absolutely necessary to get a working
> live-snapshot/backup solution.
>
> What is the status on this? Will it make it into 1.2.0?
>
> Thanks in advance,
> best regards,
> Tiziano
Hi Tiziano,
Sorry, I didn't catch your email before (feel free to cc me on any
direct questions). This did not make it into 1.2, but it should make
it into 1.3. I am prepping a new patch series now to send out (which
is how I ran across your email, as I was double checking that I did
not miss any review comments - I must have missed it when you first
sent it).
Thanks,
Jeff
>
> Am Dienstag, den 31.07.2012, 01:16 -0400 schrieb Jeff Cody:
>> These are proposed changes, to add live block commit functionality.
>>
>> I originally had intended for this RFC series to include the more
>> complicated case of a live commit of the active layer, but removed
>> it for this commit in the hopes of making it into the soft feature
>> freeze for 1.2, so this series is the simpler case.
>>
>> This series adds the basic case, of a live commit between two
>> images below the active layer, e.g.:
>>
>> [base] <--- [snp-1] <--- [snp-2] <--- [snp-3] <--- [active]
>>
>> can be collapsed down via commit, into:
>>
>> [base] <--- [active]
>>
>> or,
>>
>> [base] <--- [snp-1] <--- [active],
>>
>> [base] <--- [snp-3] <--- [active],
>>
>> etc..
>>
>>
>> qemu-io tests for the block commit will be adding onto the v1 patch
>> set.
>>
>> These patches are on top of Supriya's reopen() series, and Paolo's
>> block mirror series (the RFC series). I have not rebased yet to the
>> newer patches put out by Supriya and Paolo - this was tested w/o the
>> reopen, by manually making sure the images stayed in a r/w state.
>>
>>
>> Jeff Cody (4):
>> block: add support functions for live commit, to find and delete
>> images.
>> block: add live block commit functionality
>> qerror: new errors for live block commit, QERR_TOP_NOT_FOUND
>> QAPI: add command for live block commit, 'block-commit'
>>
>> block.c | 136 ++++++++++++++++++++++++++++++++++-
>> block.h | 4 ++
>> block/Makefile.objs | 2 +-
>> block/commit.c | 200 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> block_int.h | 19 +++++
>> blockdev.c | 78 ++++++++++++++++++++
>> qapi-schema.json | 33 +++++++++
>> qerror.c | 4 ++
>> qerror.h | 3 +
>> qmp-commands.hx | 6 ++
>> trace-events | 2 +
>> 11 files changed, 485 insertions(+), 2 deletions(-)
>> create mode 100644 block/commit.c
>>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2012-08-29 13:40 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-31 5:16 [Qemu-devel] [RFC PATCH 0/4] Live block commit Jeff Cody
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 1/4] block: add support functions for live commit, to find and delete images Jeff Cody
2012-07-31 17:34 ` Eric Blake
2012-07-31 17:52 ` Jeff Cody
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 2/4] block: add live block commit functionality Jeff Cody
2012-07-31 17:51 ` Eric Blake
2012-07-31 17:55 ` Jeff Cody
2012-08-01 6:32 ` Kevin Wolf
2012-08-01 7:07 ` Paolo Bonzini
2012-08-01 11:23 ` Jeff Cody
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 3/4] qerror: new errors for live block commit, QERR_TOP_NOT_FOUND Jeff Cody
2012-07-31 18:35 ` Eric Blake
2012-07-31 5:16 ` [Qemu-devel] [RFC PATCH 4/4] QAPI: add command for live block commit, 'block-commit' Jeff Cody
2012-07-31 18:38 ` Eric Blake
2012-08-14 7:41 ` [Qemu-devel] [RFC PATCH 0/4] Live block commit Tiziano Müller
2012-08-29 13:40 ` Jeff Cody
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).