From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:55944) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QXHgR-0004jQ-Sx for qemu-devel@nongnu.org; Thu, 16 Jun 2011 14:52:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QXHgQ-0006Lg-0X for qemu-devel@nongnu.org; Thu, 16 Jun 2011 14:52:51 -0400 Received: from mail-gy0-f173.google.com ([209.85.160.173]:53151) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QXHgP-0006Lc-KS for qemu-devel@nongnu.org; Thu, 16 Jun 2011 14:52:49 -0400 Received: by gyg4 with SMTP id 4so660544gyg.4 for ; Thu, 16 Jun 2011 11:52:48 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110615174009.094728634@amt.cnet> References: <20110615171403.158790293@amt.cnet> <20110615174009.094728634@amt.cnet> Date: Thu, 16 Jun 2011 19:52:48 +0100 Message-ID: From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [patch 4/4] QEMU live block copy List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Marcelo Tosatti Cc: kwolf@redhat.com, Jes.Sorensen@redhat.com, dlaor@redhat.com, qemu-devel@nongnu.org, avi@redhat.com, jdenemar@redhat.com On Wed, Jun 15, 2011 at 6:14 PM, Marcelo Tosatti wrot= e: A couple of comments, nothing critical. Rest of the series looks good. Bulk phase is really the dirty phase with all blocks starting dirty (except !bdrv_is_allocate() blocks cleared if shared source). There may not be much benefit in unifying bulk and dirty phases. > +#include "sysemu.h" > +#include "qjson.h" > +#include Please include system headers before QEMU headers. > +#define NAME_LEN 1024 > +#define DEV_LEN 256 Using bdrv_get_device_name() would be nicer than introducing a new max constant for BlockDriverState::device_name. > +static void blk_copy_issue_read(BdrvCopyState *s, int64_t sector, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0int nr_s= ectors) > +{ > + =A0 =A0BdrvCopyBlock *blk =3D qemu_mallocz(sizeof(BdrvCopyBlock)); > + =A0 =A0blk->buf =3D qemu_blockalign(s->src, DIRTY_CHUNK_SIZE); No need to hardcode nr_sectors =3D BDRV_SECTORS_PER_DIRTY_CHUNK: blk->buf =3D qemu_blockalign(s->src, nr_sectors * BDRV_SECTOR_SIZE); > +static int blk_issue_reads_dirty(BdrvCopyState *s) Should be blkcopy_issue_reads_dirty()? > +static int blk_issue_reads_bulk(BdrvCopyState *s) Should be blkcopy_issue_reads_bulk()? > +{ > + =A0 =A0int nr_sectors; > + =A0 =A0int64_t curr_sector =3D s->curr_sector; > + > + =A0 =A0if (s->shared_base) { > + =A0 =A0 =A0 =A0while (curr_sector < s->nr_sectors && > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0!bdrv_is_allocated(s->src, curr_sector, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 MAX= _IS_ALLOCATED_SEARCH, &nr_sectors)) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0curr_sector +=3D nr_sectors; > + =A0 =A0 =A0 =A0} Hopefully this does not block too long. bdrv_is_allocated() is synchronous and may block if the image file needs to be accessed. The pathologicial case here is when the source file is completely unallocated - usually this means the block driver does not have to traverse much image metadata so we should be okay. > + =A0 =A0} > + > + =A0 =A0if (curr_sector >=3D s->nr_sectors) { > + =A0 =A0 =A0 =A0s->curr_sector =3D 0; > + =A0 =A0 =A0 =A0return 1; > + =A0 =A0} > + > + =A0 =A0curr_sector &=3D ~((int64_t)BDRV_SECTORS_PER_DIRTY_CHUNK - 1); > + =A0 =A0nr_sectors =3D BDRV_SECTORS_PER_DIRTY_CHUNK; > + > + =A0 =A0blk_copy_issue_read(s, s->curr_sector, nr_sectors); Error return ignored? > +int do_bdrv_copy_switch(Monitor *mon, const QDict *qdict, QObject **ret_= data) > +{ > + =A0 =A0const char *device =3D qdict_get_str(qdict, "device"); > + =A0 =A0BdrvCopyState *s =3D NULL; > + =A0 =A0int open_flags; > + > + =A0 =A0QLIST_FOREACH(s, &block_copy_list, list) { > + =A0 =A0 =A0 =A0if (!strcmp(s->device_name, device)) { > + =A0 =A0 =A0 =A0 =A0 =A0if (s->stage !=3D STAGE_MIRROR_WRITES) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0qerror_report(QERR_IN_PROGRESS, "block c= opy"); > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return -1; > + =A0 =A0 =A0 =A0 =A0 =A0} > + =A0 =A0 =A0 =A0 =A0 =A0break; > + =A0 =A0 =A0 =A0} > + =A0 =A0} > + > + =A0 =A0if (!s) { > + =A0 =A0 =A0 =A0qerror_report(QERR_DEVICE_NOT_FOUND, device); > + =A0 =A0 =A0 =A0return -1; > + =A0 =A0} > + > + =A0 =A0open_flags =3D s->src->open_flags; > + > + =A0 =A0/* switch from mirrored writes to destination only */ > + =A0 =A0bdrv_flush_all(); > + =A0 =A0bdrv_close(s->src); > + =A0 =A0if (bdrv_open(s->src, s->dst->filename, s->src->open_flags, NULL= ) < 0) { > + =A0 =A0 =A0 =A0s->failed =3D 1; > + =A0 =A0 =A0 =A0goto err; > + =A0 =A0} > + > + =A0 =A0blkcopy_set_stage(s, STAGE_SWITCH_FINISHED); > + =A0 =A0blkcopy_cleanup(s); > + =A0 =A0return 0; > + > +err: > + =A0 =A0if (bdrv_open(s->src, s->src_filename, open_flags, NULL) < 0) { > + =A0 =A0 =A0 =A0error_report("%s: %s: cannot fallback to source image\n"= , __func__, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 s->src_filename); > + =A0 =A0 =A0 =A0abort(); Perhaps a better way to handle this error (in the future, not for this patch) is to pause the VM and allow the administrator to investigate and reopen the drive before resuming the VM. > +static void block_copy_vmchange(void *opaque, int running, int reason) > +{ > + =A0 =A0BdrvCopyState *s =3D opaque; > + > + =A0 =A0if (!running) { > + =A0 =A0 =A0 =A0do { > + =A0 =A0 =A0 =A0 =A0 =A0qemu_aio_flush(); > + =A0 =A0 =A0 =A0} while (!QLIST_EMPTY(&s->io_list)); > + =A0 =A0 =A0 =A0qemu_del_timer(s->aio_timer); > + =A0 =A0 =A0 =A0s->stopped =3D 1; > + =A0 =A0} else if (s->stopped) { > + =A0 =A0 =A0 =A0s->stopped =3D 0; > + =A0 =A0 =A0 =A0qemu_mod_timer(s->aio_timer, qemu_get_clock_ms(rt_clock)= ); > + =A0 =A0} > +} If you use vm_clock, which only runs when the VM is running, can you avoid this vmchange handler? > +int do_bdrv_copy_cancel(Monitor *mon, const QDict *qdict, QObject **ret_= data) > +{ > + =A0 =A0BdrvCopyState *blkcopy, *s =3D NULL; > + =A0 =A0const char *device =3D qdict_get_str(qdict, "device"); > + > + =A0 =A0QLIST_FOREACH(blkcopy, &block_copy_list, list) { > + =A0 =A0 =A0 =A0if (!strcmp(blkcopy->device_name, device)) { > + =A0 =A0 =A0 =A0 =A0 =A0s =3D blkcopy; > + =A0 =A0 =A0 =A0 =A0 =A0break; > + =A0 =A0 =A0 =A0} > + =A0 =A0} > + > + =A0 =A0if (!s || s->stage =3D=3D STAGE_SWITCH_FINISHED || s->failed) { > + =A0 =A0 =A0 =A0qerror_report(QERR_DEVICE_NOT_FOUND, device); > + =A0 =A0 =A0 =A0return -1; > + =A0 =A0} > + > + =A0 =A0s->cancelled =3D 1; > + =A0 =A0do { > + =A0 =A0 =A0 =A0qemu_aio_flush(); > + =A0 =A0} while (!QLIST_EMPTY(&s->io_list)); > + =A0 =A0blkcopy_cleanup(s); > + =A0 =A0blkcopy_free(s); How about making the cancel monitor command async and using a cancel completion function? Looping on qemu_aio_flush() is nasty - there is no reason to block VM execution. See stream_stop() in http://patchwork.ozlabs.org/patch/100415/. > + > + =A0 =A0return 0; > +} > + > +void do_info_blockcopy(Monitor *mon, QObject **ret_data) > +{ > + =A0 =A0QList *c_list; > + =A0 =A0BdrvCopyState *s; > + > + =A0 =A0c_list =3D qlist_new(); > + > + =A0 =A0QLIST_FOREACH(s, &block_copy_list, list) { > + =A0 =A0 =A0 =A0QObject *c_obj; > + =A0 =A0 =A0 =A0static const char *status[] =3D { "failed", "stopped", "= active", > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0"mirrored", "completed" }; > + =A0 =A0 =A0 =A0int i; > + > + =A0 =A0 =A0 =A0if (s->failed) { > + =A0 =A0 =A0 =A0 =A0 =A0i =3D 0; > + =A0 =A0 =A0 =A0} else if (s->stopped) { > + =A0 =A0 =A0 =A0 =A0 =A0i =3D 1; > + =A0 =A0 =A0 =A0} else if (s->stage < STAGE_MIRROR_WRITES) { > + =A0 =A0 =A0 =A0 =A0 =A0i =3D 2; > + =A0 =A0 =A0 =A0} else if (s->stage < STAGE_SWITCH_FINISHED) { > + =A0 =A0 =A0 =A0 =A0 =A0i =3D 3; > + =A0 =A0 =A0 =A0} else { > + =A0 =A0 =A0 =A0 =A0 =A0i =3D 4; > + =A0 =A0 =A0 =A0} The array indirection makes this hard to read. Perhaps just: if (s->failed) { =A0 =A0status =3D "failed"; } else if (s->stopped) { =A0 =A0status =3D "stopped"; } else if (s->stage < STAGE_MIRROR_WRITES) { =A0 =A0status =3D "active"; } else if (s->stage < STAGE_SWITCH_FINISHED) { =A0 =A0status =3D "mirrored"; } else { =A0 =A0status =3D "completed"; } > =A0 =A0 { > + =A0 =A0 =A0 =A0.name =A0 =A0 =A0 =3D "block_copy", > + =A0 =A0 =A0 =A0.args_type =A0=3D "device:s,filename:s,incremental:-i,in= coming:-m", [...] > + =A0 =A0{ > + =A0 =A0 =A0 =A0.name =A0 =A0 =A0 =3D "block_copy_cancel", > + =A0 =A0 =A0 =A0.args_type =A0=3D "device:s", [...] > + =A0 =A0{ > + =A0 =A0 =A0 =A0.name =A0 =A0 =A0 =3D "block_copy_switch", > + =A0 =A0 =A0 =A0.args_type =A0=3D "device:s", [...] > =A0 =A0 { > + =A0 =A0 =A0 =A0.name =A0 =A0 =A0 =3D "block_copy", > + =A0 =A0 =A0 =A0.args_type =A0=3D "device:s,filename:s,incremental:-i,in= coming:-m", [...] > + =A0 =A0{ > + =A0 =A0 =A0 =A0.name =A0 =A0 =A0 =3D "block_copy_cancel", > + =A0 =A0 =A0 =A0.args_type =A0=3D "device:s", [...] > + =A0 =A0{ > + =A0 =A0 =A0 =A0.name =A0 =A0 =A0 =3D "block_copy_switch", > + =A0 =A0 =A0 =A0.args_type =A0=3D "device:s", "device:B" can be used for block devices. Stefan