From: Stefan Hajnoczi <stefanha@gmail.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: kwolf@redhat.com, Jes.Sorensen@redhat.com, dlaor@redhat.com,
qemu-devel@nongnu.org, avi@redhat.com, jdenemar@redhat.com
Subject: Re: [Qemu-devel] [patch 4/4] QEMU live block copy
Date: Thu, 16 Jun 2011 19:52:48 +0100 [thread overview]
Message-ID: <BANLkTikMucvesWgSW28ybwx1k3KfKO2Uzw@mail.gmail.com> (raw)
In-Reply-To: <20110615174009.094728634@amt.cnet>
On Wed, Jun 15, 2011 at 6:14 PM, Marcelo Tosatti <mtosatti@redhat.com> wrote:
A couple of comments, nothing critical. Rest of the series looks good.
Bulk phase is really the dirty phase with all blocks starting dirty
(except !bdrv_is_allocate() blocks cleared if shared source). There
may not be much benefit in unifying bulk and dirty phases.
> +#include "sysemu.h"
> +#include "qjson.h"
> +#include <assert.h>
Please include system headers before QEMU headers.
> +#define NAME_LEN 1024
> +#define DEV_LEN 256
Using bdrv_get_device_name() would be nicer than introducing a new max
constant for BlockDriverState::device_name.
> +static void blk_copy_issue_read(BdrvCopyState *s, int64_t sector,
> + int nr_sectors)
> +{
> + BdrvCopyBlock *blk = qemu_mallocz(sizeof(BdrvCopyBlock));
> + blk->buf = qemu_blockalign(s->src, DIRTY_CHUNK_SIZE);
No need to hardcode nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK:
blk->buf = qemu_blockalign(s->src, nr_sectors * BDRV_SECTOR_SIZE);
> +static int blk_issue_reads_dirty(BdrvCopyState *s)
Should be blkcopy_issue_reads_dirty()?
> +static int blk_issue_reads_bulk(BdrvCopyState *s)
Should be blkcopy_issue_reads_bulk()?
> +{
> + int nr_sectors;
> + int64_t curr_sector = s->curr_sector;
> +
> + if (s->shared_base) {
> + while (curr_sector < s->nr_sectors &&
> + !bdrv_is_allocated(s->src, curr_sector,
> + MAX_IS_ALLOCATED_SEARCH, &nr_sectors)) {
> + curr_sector += nr_sectors;
> + }
Hopefully this does not block too long. bdrv_is_allocated() is
synchronous and may block if the image file needs to be accessed. The
pathologicial case here is when the source file is completely
unallocated - usually this means the block driver does not have to
traverse much image metadata so we should be okay.
> + }
> +
> + if (curr_sector >= s->nr_sectors) {
> + s->curr_sector = 0;
> + return 1;
> + }
> +
> + curr_sector &= ~((int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK - 1);
> + nr_sectors = BDRV_SECTORS_PER_DIRTY_CHUNK;
> +
> + blk_copy_issue_read(s, s->curr_sector, nr_sectors);
Error return ignored?
> +int do_bdrv_copy_switch(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> + const char *device = qdict_get_str(qdict, "device");
> + BdrvCopyState *s = NULL;
> + int open_flags;
> +
> + QLIST_FOREACH(s, &block_copy_list, list) {
> + if (!strcmp(s->device_name, device)) {
> + if (s->stage != STAGE_MIRROR_WRITES) {
> + qerror_report(QERR_IN_PROGRESS, "block copy");
> + return -1;
> + }
> + break;
> + }
> + }
> +
> + if (!s) {
> + qerror_report(QERR_DEVICE_NOT_FOUND, device);
> + return -1;
> + }
> +
> + open_flags = s->src->open_flags;
> +
> + /* switch from mirrored writes to destination only */
> + bdrv_flush_all();
> + bdrv_close(s->src);
> + if (bdrv_open(s->src, s->dst->filename, s->src->open_flags, NULL) < 0) {
> + s->failed = 1;
> + goto err;
> + }
> +
> + blkcopy_set_stage(s, STAGE_SWITCH_FINISHED);
> + blkcopy_cleanup(s);
> + return 0;
> +
> +err:
> + if (bdrv_open(s->src, s->src_filename, open_flags, NULL) < 0) {
> + error_report("%s: %s: cannot fallback to source image\n", __func__,
> + s->src_filename);
> + abort();
Perhaps a better way to handle this error (in the future, not for this
patch) is to pause the VM and allow the administrator to investigate
and reopen the drive before resuming the VM.
> +static void block_copy_vmchange(void *opaque, int running, int reason)
> +{
> + BdrvCopyState *s = opaque;
> +
> + if (!running) {
> + do {
> + qemu_aio_flush();
> + } while (!QLIST_EMPTY(&s->io_list));
> + qemu_del_timer(s->aio_timer);
> + s->stopped = 1;
> + } else if (s->stopped) {
> + s->stopped = 0;
> + qemu_mod_timer(s->aio_timer, qemu_get_clock_ms(rt_clock));
> + }
> +}
If you use vm_clock, which only runs when the VM is running, can you
avoid this vmchange handler?
> +int do_bdrv_copy_cancel(Monitor *mon, const QDict *qdict, QObject **ret_data)
> +{
> + BdrvCopyState *blkcopy, *s = NULL;
> + const char *device = qdict_get_str(qdict, "device");
> +
> + QLIST_FOREACH(blkcopy, &block_copy_list, list) {
> + if (!strcmp(blkcopy->device_name, device)) {
> + s = blkcopy;
> + break;
> + }
> + }
> +
> + if (!s || s->stage == STAGE_SWITCH_FINISHED || s->failed) {
> + qerror_report(QERR_DEVICE_NOT_FOUND, device);
> + return -1;
> + }
> +
> + s->cancelled = 1;
> + do {
> + qemu_aio_flush();
> + } while (!QLIST_EMPTY(&s->io_list));
> + blkcopy_cleanup(s);
> + blkcopy_free(s);
How about making the cancel monitor command async and using a cancel
completion function? Looping on qemu_aio_flush() is nasty - there is
no reason to block VM execution.
See stream_stop() in http://patchwork.ozlabs.org/patch/100415/.
> +
> + return 0;
> +}
> +
> +void do_info_blockcopy(Monitor *mon, QObject **ret_data)
> +{
> + QList *c_list;
> + BdrvCopyState *s;
> +
> + c_list = qlist_new();
> +
> + QLIST_FOREACH(s, &block_copy_list, list) {
> + QObject *c_obj;
> + static const char *status[] = { "failed", "stopped", "active",
> + "mirrored", "completed" };
> + int i;
> +
> + if (s->failed) {
> + i = 0;
> + } else if (s->stopped) {
> + i = 1;
> + } else if (s->stage < STAGE_MIRROR_WRITES) {
> + i = 2;
> + } else if (s->stage < STAGE_SWITCH_FINISHED) {
> + i = 3;
> + } else {
> + i = 4;
> + }
The array indirection makes this hard to read. Perhaps just:
if (s->failed) {
status = "failed";
} else if (s->stopped) {
status = "stopped";
} else if (s->stage < STAGE_MIRROR_WRITES) {
status = "active";
} else if (s->stage < STAGE_SWITCH_FINISHED) {
status = "mirrored";
} else {
status = "completed";
}
> {
> + .name = "block_copy",
> + .args_type = "device:s,filename:s,incremental:-i,incoming:-m",
[...]
> + {
> + .name = "block_copy_cancel",
> + .args_type = "device:s",
[...]
> + {
> + .name = "block_copy_switch",
> + .args_type = "device:s",
[...]
> {
> + .name = "block_copy",
> + .args_type = "device:s,filename:s,incremental:-i,incoming:-m",
[...]
> + {
> + .name = "block_copy_cancel",
> + .args_type = "device:s",
[...]
> + {
> + .name = "block_copy_switch",
> + .args_type = "device:s",
"device:B" can be used for block devices.
Stefan
prev parent reply other threads:[~2011-06-16 18:52 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-06-15 17:14 [Qemu-devel] [patch 0/4] live block copy (v5) Marcelo Tosatti
2011-06-15 17:14 ` [Qemu-devel] [patch 1/4] Add blkmirror block driver Marcelo Tosatti
2011-06-15 17:14 ` [Qemu-devel] [patch 2/4] Add error messages for live block copy Marcelo Tosatti
2011-06-15 17:14 ` [Qemu-devel] [patch 3/4] Add blkdebug points " Marcelo Tosatti
2011-06-15 17:14 ` [Qemu-devel] [patch 4/4] QEMU " Marcelo Tosatti
2011-06-16 7:27 ` Jiri Denemark
2011-06-16 18:52 ` Stefan Hajnoczi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BANLkTikMucvesWgSW28ybwx1k3KfKO2Uzw@mail.gmail.com \
--to=stefanha@gmail.com \
--cc=Jes.Sorensen@redhat.com \
--cc=avi@redhat.com \
--cc=dlaor@redhat.com \
--cc=jdenemar@redhat.com \
--cc=kwolf@redhat.com \
--cc=mtosatti@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).