qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Jeff Cody <jcody@redhat.com>
Cc: stefanha@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
Date: Mon, 28 Sep 2015 17:07:52 +0200	[thread overview]
Message-ID: <20150928150752.GC18068@noname.str.redhat.com> (raw)
In-Reply-To: <c3dcba707e0d2b976d788eb228336237fd7d612c.1443410673.git.jcody@redhat.com>

Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben:
> During mirror, if the target device does not have support zero
> initialization, a mirror may result in a corrupt image.

I think you want to check this sentence. ("During mirror [...], a
mirror may result [...]")

> For instance, on mirror to a host device with format = raw, whatever
> random data is on the target device will still be there for unallocated
> sectors.
> 
> This is because during the mirror, we set the dirty bitmap to copy only
> sectors allocated above 'base'.  In the case of target devices where we
> cannot assume unallocated sectors will be read as zeroes, we need to
> explicitely zero out this data.
> 
> In order to avoid zeroing out all sectors of the target device prior to
> mirroring, we do zeroing as part of the block job.  A second dirty
> bitmap cache is created, to track sectors that are unallocated above
> 'base'.  These sectors are then checked for status of BDRV_BLOCK_ZERO
> on the target - if they are not, then zeroes are explicitly written.

Why do you need a bitmap? You never change the bitmap after initialising
it, so couldn't you instead just check the allocation status when you
need it?

In fact, why do we need two passes? I would have expected that commit
dcfb3beb already does the trick, with checking allocation status and
writing zeroes during the normal single pass.

If that commit fails to solve the problem, I guess I first need to
understand why before I can continue reviewing this one...

> This only occurs under two conditions:
> 
>     1. 'mode' != "existing"
>     2. bdrv_has_zero_init(target) == NULL
> 
> We perform the mirroring through mirror_iteration() as before, except
> in two passes.  If the above two conditions are met, the first pass
> is using the bitmap tracking unallocated sectors, to write the needed
> zeroes.  Then, the second pass is performed, to mirror the actual data
> as before.
> 
> If the above two conditions are not met, then the first pass is skipped,
> and only the second pass (the one with the actual data) is performed.
> 
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
>  block/mirror.c            | 109 ++++++++++++++++++++++++++++++++++------------
>  blockdev.c                |   2 +-
>  include/block/block_int.h |   3 +-
>  qapi/block-core.json      |   6 ++-
>  4 files changed, 87 insertions(+), 33 deletions(-)
> 
> diff --git a/block/mirror.c b/block/mirror.c
> index 405e5c4..b599176 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob {
>      int64_t bdev_length;
>      unsigned long *cow_bitmap;
>      BdrvDirtyBitmap *dirty_bitmap;
> -    HBitmapIter hbi;
> +    HBitmapIter zero_hbi;
> +    HBitmapIter allocated_hbi;
> +    HBitmapIter *hbi;
>      uint8_t *buf;
>      QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
>      int buf_free_count;
> @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob {
>      int sectors_in_flight;
>      int ret;
>      bool unmap;
> +    bool zero_unallocated;
> +    bool zero_cycle;
>      bool waiting_for_io;
>  } MirrorBlockJob;
>  
> @@ -166,10 +170,10 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>      int pnum;
>      int64_t ret;
>  
> -    s->sector_num = hbitmap_iter_next(&s->hbi);
> +    s->sector_num = hbitmap_iter_next(s->hbi);
>      if (s->sector_num < 0) {
> -        bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
> -        s->sector_num = hbitmap_iter_next(&s->hbi);
> +        bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi);
> +        s->sector_num = hbitmap_iter_next(s->hbi);
>          trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
>          assert(s->sector_num >= 0);
>      }
> @@ -287,7 +291,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>           */
>          if (next_sector > hbitmap_next_sector
>              && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
> -            hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
> +            hbitmap_next_sector = hbitmap_iter_next(s->hbi);
>          }
>  
>          next_sector += sectors_per_chunk;
> @@ -300,25 +304,34 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
>      s->sectors_in_flight += nb_sectors;
>      trace_mirror_one_iteration(s, sector_num, nb_sectors);
>  
> -    ret = bdrv_get_block_status_above(source, NULL, sector_num,
> -                                      nb_sectors, &pnum);
> -    if (ret < 0 || pnum < nb_sectors ||
> -            (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
> -        bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
> -                       mirror_read_complete, op);
> -    } else if (ret & BDRV_BLOCK_ZERO) {
> -        bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> -                              s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> -                              mirror_write_complete, op);
> +    if (s->zero_cycle) {
> +        ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, &pnum);
> +        if (!(ret & BDRV_BLOCK_ZERO)) {
> +            bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> +                                  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> +                                  mirror_write_complete, op);
> +        }

It seems to be expected that this function always involves an AIO
request and the completion event is what helps making progress. For the
BDRV_BLOCK_ZERO case, we don't do that however. I'm not sure what
exactly this means, but at least I think we are applying block job
throttling to doing nothing with some areas of the image.

>      } else {
> -        assert(!(ret & BDRV_BLOCK_DATA));
> -        bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
> -                         mirror_write_complete, op);
> +        ret = bdrv_get_block_status_above(source, NULL, sector_num,
> +                                          nb_sectors, &pnum);
> +        if (ret < 0 || pnum < nb_sectors ||
> +                (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
> +            bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
> +                           mirror_read_complete, op);
> +        } else if (ret & BDRV_BLOCK_ZERO) {
> +            bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> +                                  s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> +                                  mirror_write_complete, op);
> +        } else {
> +            assert(!(ret & BDRV_BLOCK_DATA));
> +            bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
> +                             mirror_write_complete, op);
> +        }
>      }
>      return delay_ns;
>  }

Kevin

  parent reply	other threads:[~2015-09-28 15:08 UTC|newest]

Thread overview: 35+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-28  3:29 [Qemu-devel] [PATCH 0/3] block: mirror - Write zeroes for unallocated sectors if no zero init Jeff Cody
2015-09-28  3:29 ` [Qemu-devel] [PATCH 1/3] block: allow creation of detached dirty bitmaps Jeff Cody
2015-09-28 14:41   ` Kevin Wolf
2015-09-28 15:13   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-09-28 16:38   ` Max Reitz
2015-09-28  3:29 ` [Qemu-devel] [PATCH 2/3] block: mirror - split out part of mirror_run() Jeff Cody
2015-09-28 14:17   ` Paolo Bonzini
2015-09-28 14:47   ` Kevin Wolf
2015-09-28 16:50   ` [Qemu-devel] [Qemu-block] " Max Reitz
2015-09-28  3:29 ` [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present Jeff Cody
2015-09-28 14:13   ` Paolo Bonzini
2015-09-28 20:31     ` Eric Blake
2015-09-29  8:10       ` Kevin Wolf
2015-09-29  8:42         ` Paolo Bonzini
2015-09-29  9:35           ` Kevin Wolf
2015-09-29 10:52             ` Paolo Bonzini
2015-09-30 14:43               ` Jeff Cody
2015-09-30 15:16                 ` Paolo Bonzini
2015-09-30 15:26                 ` Kevin Wolf
2015-09-30 16:02                   ` Jeff Cody
2015-09-30 16:06                     ` Paolo Bonzini
2015-10-01  8:23                       ` Kevin Wolf
2015-09-28 21:32     ` Jeff Cody
2015-09-29  2:48       ` Eric Blake
2015-09-28 15:07   ` Kevin Wolf [this message]
2015-09-28 21:57     ` Jeff Cody
2015-09-29  8:28       ` Kevin Wolf
2015-09-28 15:10   ` Kevin Wolf
2015-09-28 21:58     ` Jeff Cody
2015-09-28 15:23   ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-09-30 15:11     ` Jeff Cody
2015-09-30 15:28       ` Kevin Wolf
2015-09-28 17:32   ` Max Reitz
2015-09-29  8:39     ` Kevin Wolf
2015-09-29 14:47       ` [Qemu-devel] " Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150928150752.GC18068@noname.str.redhat.com \
    --to=kwolf@redhat.com \
    --cc=jcody@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).