From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:41032)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <famz@redhat.com>) id 1VM8VQ-0007Ef-6Z
	for qemu-devel@nongnu.org; Tue, 17 Sep 2013 23:32:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <famz@redhat.com>) id 1VM8VK-0005Di-5z
	for qemu-devel@nongnu.org; Tue, 17 Sep 2013 23:32:44 -0400
Received: from mx1.redhat.com ([209.132.183.28]:5559)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <famz@redhat.com>) id 1VM8VJ-0005DX-KI
	for qemu-devel@nongnu.org; Tue, 17 Sep 2013 23:32:38 -0400
Date: Wed, 18 Sep 2013 11:32:31 +0800
From: Fam Zheng <famz@redhat.com>
Message-ID: <20130918033231.GA24523@T430s.nay.redhat.com>
References: <1376554447-28638-1-git-send-email-famz@redhat.com>
	<1376554447-28638-2-git-send-email-famz@redhat.com>
	<20130904123504.GB12733@stefanha-thinkpad.redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20130904123504.GB12733@stefanha-thinkpad.redhat.com>
Subject: Re: [Qemu-devel] [PATCH v3 1/2] block: allow live commit of active
 image
Reply-To: famz@redhat.com
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Stefan Hajnoczi <stefanha@gmail.com>
Cc: kwolf@redhat.com, pbonzini@redhat.com, jcody@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com

On Wed, 09/04 14:35, Stefan Hajnoczi wrote:
> On Thu, Aug 15, 2013 at 04:14:06PM +0800, Fam Zheng wrote:
> > diff --git a/block/commit.c b/block/commit.c
> > index 2227fc2..b5e024b 100644
> > --- a/block/commit.c
> > +++ b/block/commit.c
> > @@ -17,14 +17,13 @@
> >  #include "block/blockjob.h"
> >  #include "qemu/ratelimit.h"
> >  
> > -enum {
> > -    /*
> > -     * Size of data buffer for populating the image file.  This should be large
> > -     * enough to process multiple clusters in a single call, so that populating
> > -     * contiguous regions of the image is efficient.
> > -     */
> > -    COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
> > -};
> > +/*
> > + * Size of data buffer for populating the image file.  This should be large
> > + * enough to process multiple clusters in a single call, so that populating
> > + * contiguous regions of the image is efficient.
> > + */
> > +#define COMMIT_BUFFER_SECTORS 128
> > +#define COMMIT_BUFFER_BYTES (COMMIT_BUFFER_SECTORS * BDRV_SECTOR_SIZE)
> 
> Changing from 512 KB to 64 KB can affect performance.  8 times as many
> iops may be issued to copy data.
> 
> Also, the image's cluster size should really be taken into account.
> Otherwise additional inefficiency will be suffered when we populate a
> 128 KB cluster with a COMMIT_BUFFER_SECTORS (64 KB) write only to
> overwrite the remaining part in the next loop iteration.
> 
> This can be solved by setting dirty bitmap granularity to cluster size
> or 64 KB minimum *and* finding continuous runs of dirty bits so larger
> I/Os can be performed by the main loop (up to 512 KB in one request).
> 
> >  #define SLICE_TIME 100000000ULL /* ns */
> >  
> > @@ -34,11 +33,27 @@ typedef struct CommitBlockJob {
> >      BlockDriverState *active;
> >      BlockDriverState *top;
> >      BlockDriverState *base;
> > +    BlockDriverState *overlay;
> >      BlockdevOnError on_error;
> >      int base_flags;
> >      int orig_overlay_flags;
> > +    bool should_complete;
> > +    bool ready;
> 
> Why introduce the ready state when the active layer is being committed?
> 
> There is no documentation update that mentions the job will not complete
> by itself if the top image is active.
> 
> > +    for (;;) {
> > +        int64_t cnt = bdrv_get_dirty_count(s->top);
> > +        if (cnt == 0) {
> > +            if (!s->overlay && !s->ready) {
> > +                s->ready = true;
> > +                block_job_ready(&s->common);
> >              }
> > -            ret = commit_populate(top, base, sector_num, n, buf);
> > -            bytes_written += n * BDRV_SECTOR_SIZE;
> > +            /* We can complete if user called complete job or the job is
> > +             * committing non-active image */
> > +            if (s->should_complete || s->overlay) {
> > +                break;
> 
> This termination condition is not safe:
> 
> A write request only marks the dirty bitmap upon completion.  A guest
> write request could still be in flight so we get cnt == 0 but we
> actually have not copied all data into the base.

Can we mark the dirty bitmap immediately upon getting guest write request?

Fam

> Completing safely is a little tricky because bdrv_drain_all() is
> synchronous and we don't want to call that.  But waiting for
> bs->tracked_requests to be empty is also bad because it's a busy wait.
> 
> Ideally we'd get woken up whenever a write request finishes.