From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41032) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VM8VQ-0007Ef-6Z for qemu-devel@nongnu.org; Tue, 17 Sep 2013 23:32:50 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1VM8VK-0005Di-5z for qemu-devel@nongnu.org; Tue, 17 Sep 2013 23:32:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:5559) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1VM8VJ-0005DX-KI for qemu-devel@nongnu.org; Tue, 17 Sep 2013 23:32:38 -0400 Date: Wed, 18 Sep 2013 11:32:31 +0800 From: Fam Zheng Message-ID: <20130918033231.GA24523@T430s.nay.redhat.com> References: <1376554447-28638-1-git-send-email-famz@redhat.com> <1376554447-28638-2-git-send-email-famz@redhat.com> <20130904123504.GB12733@stefanha-thinkpad.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130904123504.GB12733@stefanha-thinkpad.redhat.com> Subject: Re: [Qemu-devel] [PATCH v3 1/2] block: allow live commit of active image Reply-To: famz@redhat.com List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: kwolf@redhat.com, pbonzini@redhat.com, jcody@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com On Wed, 09/04 14:35, Stefan Hajnoczi wrote: > On Thu, Aug 15, 2013 at 04:14:06PM +0800, Fam Zheng wrote: > > diff --git a/block/commit.c b/block/commit.c > > index 2227fc2..b5e024b 100644 > > --- a/block/commit.c > > +++ b/block/commit.c > > @@ -17,14 +17,13 @@ > > #include "block/blockjob.h" > > #include "qemu/ratelimit.h" > > > > -enum { > > - /* > > - * Size of data buffer for populating the image file. This should be large > > - * enough to process multiple clusters in a single call, so that populating > > - * contiguous regions of the image is efficient. > > - */ > > - COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */ > > -}; > > +/* > > + * Size of data buffer for populating the image file. This should be large > > + * enough to process multiple clusters in a single call, so that populating > > + * contiguous regions of the image is efficient. > > + */ > > +#define COMMIT_BUFFER_SECTORS 128 > > +#define COMMIT_BUFFER_BYTES (COMMIT_BUFFER_SECTORS * BDRV_SECTOR_SIZE) > > Changing from 512 KB to 64 KB can affect performance. 8 times as many > iops may be issued to copy data. > > Also, the image's cluster size should really be taken into account. > Otherwise additional inefficiency will be suffered when we populate a > 128 KB cluster with a COMMIT_BUFFER_SECTORS (64 KB) write only to > overwrite the remaining part in the next loop iteration. > > This can be solved by setting dirty bitmap granularity to cluster size > or 64 KB minimum *and* finding continuous runs of dirty bits so larger > I/Os can be performed by the main loop (up to 512 KB in one request). > > > #define SLICE_TIME 100000000ULL /* ns */ > > > > @@ -34,11 +33,27 @@ typedef struct CommitBlockJob { > > BlockDriverState *active; > > BlockDriverState *top; > > BlockDriverState *base; > > + BlockDriverState *overlay; > > BlockdevOnError on_error; > > int base_flags; > > int orig_overlay_flags; > > + bool should_complete; > > + bool ready; > > Why introduce the ready state when the active layer is being committed? > > There is no documentation update that mentions the job will not complete > by itself if the top image is active. > > > + for (;;) { > > + int64_t cnt = bdrv_get_dirty_count(s->top); > > + if (cnt == 0) { > > + if (!s->overlay && !s->ready) { > > + s->ready = true; > > + block_job_ready(&s->common); > > } > > - ret = commit_populate(top, base, sector_num, n, buf); > > - bytes_written += n * BDRV_SECTOR_SIZE; > > + /* We can complete if user called complete job or the job is > > + * committing non-active image */ > > + if (s->should_complete || s->overlay) { > > + break; > > This termination condition is not safe: > > A write request only marks the dirty bitmap upon completion. A guest > write request could still be in flight so we get cnt == 0 but we > actually have not copied all data into the base. Can we mark the dirty bitmap immediately upon getting guest write request? Fam > Completing safely is a little tricky because bdrv_drain_all() is > synchronous and we don't want to call that. But waiting for > bs->tracked_requests to be empty is also bad because it's a busy wait. > > Ideally we'd get woken up whenever a write request finishes.