From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:54586) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QTvBZ-00025b-UA for qemu-devel@nongnu.org; Tue, 07 Jun 2011 08:15:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QTvBY-0008U5-F3 for qemu-devel@nongnu.org; Tue, 07 Jun 2011 08:15:05 -0400 Received: from mail-gx0-f173.google.com ([209.85.161.173]:46036) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QTvBY-0008Th-5j for qemu-devel@nongnu.org; Tue, 07 Jun 2011 08:15:04 -0400 Received: by gxk26 with SMTP id 26so2323757gxk.4 for ; Tue, 07 Jun 2011 05:15:03 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <20110606165823.855959321@amt.cnet> References: <20110606165536.581119615@amt.cnet> <20110606165823.855959321@amt.cnet> Date: Tue, 7 Jun 2011 13:15:02 +0100 Message-ID: From: Stefan Hajnoczi Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [patch 6/7] QEMU live block copy List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Marcelo Tosatti Cc: kwolf@redhat.com, Jes.Sorensen@redhat.com, dlaor@redhat.com, qemu-devel@nongnu.org, avi@redhat.com, jdenemar@redhat.com On Mon, Jun 6, 2011 at 5:55 PM, Marcelo Tosatti wrote= : I haven't reviewed this whole patch yet, but comments below. This patch, like image streaming, may hit deadlocks due to synchronous I/O emulation. I discovered this problem when working on image streaming and it should be solved by getting rid of the asynchronous context concept. The problem is that async I/O emulation will push a new context, preventing existing requests to complete until the current context is popped again. If the image format has dependencies between requests (e.g. QED allocating writes are serialized), then this leads to deadlock because the new request cannot complete until the old one does, but the old one needs to wait for the context to be popped. I think you are not affected by the QED allocating write case since the source image is only read, not written, by live block copy. But you might encounter this problem in other places. > +#define BLOCK_SIZE (BDRV_SECTORS_PER_DIRTY_CHUNK << BDRV_SECTOR_BITS) Should this be called DIRTY_CHUNK_SIZE? > +static void alloc_aio_bitmap(BdrvCopyState *s) > +{ > + =A0 =A0BlockDriverState *bs =3D s->src; > + =A0 =A0int64_t bitmap_size; > + > + =A0 =A0bitmap_size =3D (bdrv_getlength(bs) >> BDRV_SECTOR_BITS) + > + =A0 =A0 =A0 =A0 =A0 =A0BDRV_SECTORS_PER_DIRTY_CHUNK * 8 - 1; > + =A0 =A0bitmap_size /=3D BDRV_SECTORS_PER_DIRTY_CHUNK * 8; > + > + =A0 =A0s->aio_bitmap =3D qemu_mallocz(bitmap_size); > +} There is bitmap.h, which has a bunch of common bitmap code that could be re= used. > +static void blk_copy_issue_read(BdrvCopyState *s, int64_t sector, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0int nr_s= ectors) > +{ > + =A0 =A0BdrvCopyBlock *blk =3D qemu_mallocz(sizeof(BdrvCopyBlock)); > + =A0 =A0blk->buf =3D qemu_mallocz(BLOCK_SIZE); qemu_blockalign() should be used to meet block device alignment requirement= s. > + =A0 =A0blk->state =3D s; > + =A0 =A0blk->sector =3D sector; > + =A0 =A0blk->nr_sectors =3D nr_sectors; > + =A0 =A0QLIST_INSERT_HEAD(&s->io_list, blk, list); > + > + =A0 =A0blk->iov.iov_base =3D blk->buf; > + =A0 =A0blk->iov.iov_len =3D nr_sectors * BDRV_SECTOR_SIZE; > + =A0 =A0qemu_iovec_init_external(&blk->qiov, &blk->iov, 1); > + > + =A0 =A0s->inflight_reads++; > + =A0 =A0blk->time =3D qemu_get_clock_ns(rt_clock); > + =A0 =A0blk->aiocb =3D bdrv_aio_readv(s->src, sector, &blk->qiov, nr_sec= tors, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0blk_copy= _read_cb, blk); > + =A0 =A0if (!blk->aiocb) { > + =A0 =A0 =A0 =A0s->error =3D 1; > + =A0 =A0 =A0 =A0goto error; > + =A0 =A0} > + > + =A0 =A0return; > + > +error: > + =A0 =A0s->inflight_reads--; > + =A0 =A0QLIST_REMOVE(blk, list); > + =A0 =A0qemu_free(blk->buf); qemu_vfree() after you switch to qemu_blockalign() above. > + =A0 =A0qemu_free(blk); > +} > +static void blkcopy_cleanup(BdrvCopyState *s) > +{ Need to free dst, at least on error? > + =A0 =A0assert(s->inflight_reads =3D=3D 0); > + =A0 =A0assert(QLIST_EMPTY(&s->io_list)); > + =A0 =A0bdrv_set_dirty_tracking(s->src, 0); > + =A0 =A0drive_put_ref(drive_get_by_blockdev(s->src)); > + =A0 =A0bdrv_set_in_use(s->src, 0); > + =A0 =A0if (s->stage >=3D STAGE_DIRTY) { > + =A0 =A0 =A0 =A0qemu_free(s->aio_bitmap); > + =A0 =A0} > + =A0 =A0qemu_del_timer(s->aio_timer); Missing qemu_free_timer(). > +} > +static int bdrv_copy_setup(const char *device, const char *filename, > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 bool shared_base) > +{ > + =A0 =A0int64_t sectors; > + =A0 =A0BdrvCopyState *blkcopy, *safe; > + =A0 =A0BlockDriverState *src, *dst; > + > + =A0 =A0src =3D bdrv_find(device); > + =A0 =A0if (src) { > + =A0 =A0 =A0 =A0qerror_report(QERR_DEVICE_NOT_FOUND, device); > + =A0 =A0 =A0 =A0return -1; > + =A0 =A0} > + > + =A0 =A0dst =3D bdrv_new(""); > + =A0 =A0if (bdrv_open(dst, filename, src->open_flags, NULL) < 0) { > + =A0 =A0 =A0 =A0bdrv_delete(dst); > + =A0 =A0 =A0 =A0qerror_report(QERR_OPEN_FILE_FAILED, filename); > + =A0 =A0 =A0 =A0return -1; > + =A0 =A0} > + > + =A0 =A0QLIST_FOREACH_SAFE(blkcopy, &block_copy_list, list, safe) { > + =A0 =A0 =A0 =A0if (!strcmp(blkcopy->device_name, src->device_name)) { > + =A0 =A0 =A0 =A0 =A0 =A0if (blkcopy->stage =3D=3D STAGE_SWITCH_FINISHED = || blkcopy->failed) { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0blkcopy_free(blkcopy); > + =A0 =A0 =A0 =A0 =A0 =A0} else { > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0qerror_report(QERR_IN_PROGRESS, "block c= opy"); dst is leaked. > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return -1; > + =A0 =A0 =A0 =A0 =A0 =A0} > + =A0 =A0 =A0 =A0} > + =A0 =A0} > + > + =A0 =A0sectors =3D bdrv_getlength(src) >> BDRV_SECTOR_BITS; > + =A0 =A0if (sectors !=3D bdrv_getlength(dst) >> BDRV_SECTOR_BITS) { > + =A0 =A0 =A0 =A0qerror_report(QERR_BLOCKCOPY_IMAGE_SIZE_DIFFERS); > + =A0 =A0 =A0 =A0return -1; dst is leaked. Stefan