Re: [Qemu-devel] [PATCH v3 1/2] block: allow live commit of active image

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Stefan Hajnoczi <stefanha@gmail.com>
To: Fam Zheng <famz@redhat.com>
Cc: kwolf@redhat.com, pbonzini@redhat.com, jcody@redhat.com,
	qemu-devel@nongnu.org, stefanha@redhat.com
Subject: Re: [Qemu-devel] [PATCH v3 1/2] block: allow live commit of active image
Date: Wed, 18 Sep 2013 11:36:14 +0200	[thread overview]
Message-ID: <20130918093614.GC13359@stefanha-thinkpad.redhat.com> (raw)
In-Reply-To: <20130918033231.GA24523@T430s.nay.redhat.com>

On Wed, Sep 18, 2013 at 11:32:31AM +0800, Fam Zheng wrote:
> On Wed, 09/04 14:35, Stefan Hajnoczi wrote:
> > On Thu, Aug 15, 2013 at 04:14:06PM +0800, Fam Zheng wrote:
> > > diff --git a/block/commit.c b/block/commit.c
> > > index 2227fc2..b5e024b 100644
> > > --- a/block/commit.c
> > > +++ b/block/commit.c
> > > @@ -17,14 +17,13 @@
> > >  #include "block/blockjob.h"
> > >  #include "qemu/ratelimit.h"
> > >  
> > > -enum {
> > > -    /*
> > > -     * Size of data buffer for populating the image file.  This should be large
> > > -     * enough to process multiple clusters in a single call, so that populating
> > > -     * contiguous regions of the image is efficient.
> > > -     */
> > > -    COMMIT_BUFFER_SIZE = 512 * 1024, /* in bytes */
> > > -};
> > > +/*
> > > + * Size of data buffer for populating the image file.  This should be large
> > > + * enough to process multiple clusters in a single call, so that populating
> > > + * contiguous regions of the image is efficient.
> > > + */
> > > +#define COMMIT_BUFFER_SECTORS 128
> > > +#define COMMIT_BUFFER_BYTES (COMMIT_BUFFER_SECTORS * BDRV_SECTOR_SIZE)
> > 
> > Changing from 512 KB to 64 KB can affect performance.  8 times as many
> > iops may be issued to copy data.
> > 
> > Also, the image's cluster size should really be taken into account.
> > Otherwise additional inefficiency will be suffered when we populate a
> > 128 KB cluster with a COMMIT_BUFFER_SECTORS (64 KB) write only to
> > overwrite the remaining part in the next loop iteration.
> > 
> > This can be solved by setting dirty bitmap granularity to cluster size
> > or 64 KB minimum *and* finding continuous runs of dirty bits so larger
> > I/Os can be performed by the main loop (up to 512 KB in one request).
> > 
> > >  #define SLICE_TIME 100000000ULL /* ns */
> > >  
> > > @@ -34,11 +33,27 @@ typedef struct CommitBlockJob {
> > >      BlockDriverState *active;
> > >      BlockDriverState *top;
> > >      BlockDriverState *base;
> > > +    BlockDriverState *overlay;
> > >      BlockdevOnError on_error;
> > >      int base_flags;
> > >      int orig_overlay_flags;
> > > +    bool should_complete;
> > > +    bool ready;
> > 
> > Why introduce the ready state when the active layer is being committed?
> > 
> > There is no documentation update that mentions the job will not complete
> > by itself if the top image is active.
> > 
> > > +    for (;;) {
> > > +        int64_t cnt = bdrv_get_dirty_count(s->top);
> > > +        if (cnt == 0) {
> > > +            if (!s->overlay && !s->ready) {
> > > +                s->ready = true;
> > > +                block_job_ready(&s->common);
> > >              }
> > > -            ret = commit_populate(top, base, sector_num, n, buf);
> > > -            bytes_written += n * BDRV_SECTOR_SIZE;
> > > +            /* We can complete if user called complete job or the job is
> > > +             * committing non-active image */
> > > +            if (s->should_complete || s->overlay) {
> > > +                break;
> > 
> > This termination condition is not safe:
> > 
> > A write request only marks the dirty bitmap upon completion.  A guest
> > write request could still be in flight so we get cnt == 0 but we
> > actually have not copied all data into the base.
> 
> Can we mark the dirty bitmap immediately upon getting guest write request?

No, because then the bit might get cleared before the request completes.
The actual request might not hit the disk straight away - it could yield
on an image format coroutine mutex.

We could do the equivalent of drain asynchronously: get a callback when
there are no requests.  There is also a stricter form of this with a
guarantee that the guest cannot make us wait forever: "freeze" the block
device so new requests will yield immediately until the device is
unfrozen.  Now a guest cannot stop us from completing by continuously
submitting requests.

Note that freeze has the disadvantage that the guest might time out if
we don't unfreeze the device soon.

Stefan

next prev parent reply	other threads:[~2013-09-18  9:36 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-15  8:14 [Qemu-devel] [PATCH v3 0/2] block: allow commit active as top Fam Zheng
2013-08-15  8:14 ` [Qemu-devel] [PATCH v3 1/2] block: allow live commit of active image Fam Zheng
2013-09-04 12:35   ` Stefan Hajnoczi
2013-09-18  3:32     ` Fam Zheng
2013-09-18  9:36       ` Stefan Hajnoczi [this message]
2013-09-18 11:46     ` Paolo Bonzini
2013-08-15  8:14 ` [Qemu-devel] [PATCH v3 2/2] qemu-iotests: update test cases for commit active Fam Zheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130918093614.GC13359@stefanha-thinkpad.redhat.com \
    --to=stefanha@gmail.com \
    --cc=famz@redhat.com \
    --cc=jcody@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).