From: Kevin Wolf <kwolf@redhat.com>
To: Jeff Cody <jcody@redhat.com>
Cc: stefanha@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present
Date: Mon, 28 Sep 2015 17:07:52 +0200 [thread overview]
Message-ID: <20150928150752.GC18068@noname.str.redhat.com> (raw)
In-Reply-To: <c3dcba707e0d2b976d788eb228336237fd7d612c.1443410673.git.jcody@redhat.com>
Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben:
> During mirror, if the target device does not have support zero
> initialization, a mirror may result in a corrupt image.
I think you want to check this sentence. ("During mirror [...], a
mirror may result [...]")
> For instance, on mirror to a host device with format = raw, whatever
> random data is on the target device will still be there for unallocated
> sectors.
>
> This is because during the mirror, we set the dirty bitmap to copy only
> sectors allocated above 'base'. In the case of target devices where we
> cannot assume unallocated sectors will be read as zeroes, we need to
> explicitely zero out this data.
>
> In order to avoid zeroing out all sectors of the target device prior to
> mirroring, we do zeroing as part of the block job. A second dirty
> bitmap cache is created, to track sectors that are unallocated above
> 'base'. These sectors are then checked for status of BDRV_BLOCK_ZERO
> on the target - if they are not, then zeroes are explicitly written.
Why do you need a bitmap? You never change the bitmap after initialising
it, so couldn't you instead just check the allocation status when you
need it?
In fact, why do we need two passes? I would have expected that commit
dcfb3beb already does the trick, with checking allocation status and
writing zeroes during the normal single pass.
If that commit fails to solve the problem, I guess I first need to
understand why before I can continue reviewing this one...
> This only occurs under two conditions:
>
> 1. 'mode' != "existing"
> 2. bdrv_has_zero_init(target) == NULL
>
> We perform the mirroring through mirror_iteration() as before, except
> in two passes. If the above two conditions are met, the first pass
> is using the bitmap tracking unallocated sectors, to write the needed
> zeroes. Then, the second pass is performed, to mirror the actual data
> as before.
>
> If the above two conditions are not met, then the first pass is skipped,
> and only the second pass (the one with the actual data) is performed.
>
> Signed-off-by: Jeff Cody <jcody@redhat.com>
> ---
> block/mirror.c | 109 ++++++++++++++++++++++++++++++++++------------
> blockdev.c | 2 +-
> include/block/block_int.h | 3 +-
> qapi/block-core.json | 6 ++-
> 4 files changed, 87 insertions(+), 33 deletions(-)
>
> diff --git a/block/mirror.c b/block/mirror.c
> index 405e5c4..b599176 100644
> --- a/block/mirror.c
> +++ b/block/mirror.c
> @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob {
> int64_t bdev_length;
> unsigned long *cow_bitmap;
> BdrvDirtyBitmap *dirty_bitmap;
> - HBitmapIter hbi;
> + HBitmapIter zero_hbi;
> + HBitmapIter allocated_hbi;
> + HBitmapIter *hbi;
> uint8_t *buf;
> QSIMPLEQ_HEAD(, MirrorBuffer) buf_free;
> int buf_free_count;
> @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob {
> int sectors_in_flight;
> int ret;
> bool unmap;
> + bool zero_unallocated;
> + bool zero_cycle;
> bool waiting_for_io;
> } MirrorBlockJob;
>
> @@ -166,10 +170,10 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> int pnum;
> int64_t ret;
>
> - s->sector_num = hbitmap_iter_next(&s->hbi);
> + s->sector_num = hbitmap_iter_next(s->hbi);
> if (s->sector_num < 0) {
> - bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi);
> - s->sector_num = hbitmap_iter_next(&s->hbi);
> + bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi);
> + s->sector_num = hbitmap_iter_next(s->hbi);
> trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bitmap));
> assert(s->sector_num >= 0);
> }
> @@ -287,7 +291,7 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> */
> if (next_sector > hbitmap_next_sector
> && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {
> - hbitmap_next_sector = hbitmap_iter_next(&s->hbi);
> + hbitmap_next_sector = hbitmap_iter_next(s->hbi);
> }
>
> next_sector += sectors_per_chunk;
> @@ -300,25 +304,34 @@ static uint64_t coroutine_fn mirror_iteration(MirrorBlockJob *s)
> s->sectors_in_flight += nb_sectors;
> trace_mirror_one_iteration(s, sector_num, nb_sectors);
>
> - ret = bdrv_get_block_status_above(source, NULL, sector_num,
> - nb_sectors, &pnum);
> - if (ret < 0 || pnum < nb_sectors ||
> - (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
> - bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
> - mirror_read_complete, op);
> - } else if (ret & BDRV_BLOCK_ZERO) {
> - bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> - s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> - mirror_write_complete, op);
> + if (s->zero_cycle) {
> + ret = bdrv_get_block_status(s->target, sector_num, nb_sectors, &pnum);
> + if (!(ret & BDRV_BLOCK_ZERO)) {
> + bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> + s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> + mirror_write_complete, op);
> + }
It seems to be expected that this function always involves an AIO
request and the completion event is what helps making progress. For the
BDRV_BLOCK_ZERO case, we don't do that however. I'm not sure what
exactly this means, but at least I think we are applying block job
throttling to doing nothing with some areas of the image.
> } else {
> - assert(!(ret & BDRV_BLOCK_DATA));
> - bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
> - mirror_write_complete, op);
> + ret = bdrv_get_block_status_above(source, NULL, sector_num,
> + nb_sectors, &pnum);
> + if (ret < 0 || pnum < nb_sectors ||
> + (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {
> + bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors,
> + mirror_read_complete, op);
> + } else if (ret & BDRV_BLOCK_ZERO) {
> + bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors,
> + s->unmap ? BDRV_REQ_MAY_UNMAP : 0,
> + mirror_write_complete, op);
> + } else {
> + assert(!(ret & BDRV_BLOCK_DATA));
> + bdrv_aio_discard(s->target, sector_num, op->nb_sectors,
> + mirror_write_complete, op);
> + }
> }
> return delay_ns;
> }
Kevin
next prev parent reply other threads:[~2015-09-28 15:08 UTC|newest]
Thread overview: 35+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-09-28 3:29 [Qemu-devel] [PATCH 0/3] block: mirror - Write zeroes for unallocated sectors if no zero init Jeff Cody
2015-09-28 3:29 ` [Qemu-devel] [PATCH 1/3] block: allow creation of detached dirty bitmaps Jeff Cody
2015-09-28 14:41 ` Kevin Wolf
2015-09-28 15:13 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-09-28 16:38 ` Max Reitz
2015-09-28 3:29 ` [Qemu-devel] [PATCH 2/3] block: mirror - split out part of mirror_run() Jeff Cody
2015-09-28 14:17 ` Paolo Bonzini
2015-09-28 14:47 ` Kevin Wolf
2015-09-28 16:50 ` [Qemu-devel] [Qemu-block] " Max Reitz
2015-09-28 3:29 ` [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present Jeff Cody
2015-09-28 14:13 ` Paolo Bonzini
2015-09-28 20:31 ` Eric Blake
2015-09-29 8:10 ` Kevin Wolf
2015-09-29 8:42 ` Paolo Bonzini
2015-09-29 9:35 ` Kevin Wolf
2015-09-29 10:52 ` Paolo Bonzini
2015-09-30 14:43 ` Jeff Cody
2015-09-30 15:16 ` Paolo Bonzini
2015-09-30 15:26 ` Kevin Wolf
2015-09-30 16:02 ` Jeff Cody
2015-09-30 16:06 ` Paolo Bonzini
2015-10-01 8:23 ` Kevin Wolf
2015-09-28 21:32 ` Jeff Cody
2015-09-29 2:48 ` Eric Blake
2015-09-28 15:07 ` Kevin Wolf [this message]
2015-09-28 21:57 ` Jeff Cody
2015-09-29 8:28 ` Kevin Wolf
2015-09-28 15:10 ` Kevin Wolf
2015-09-28 21:58 ` Jeff Cody
2015-09-28 15:23 ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2015-09-30 15:11 ` Jeff Cody
2015-09-30 15:28 ` Kevin Wolf
2015-09-28 17:32 ` Max Reitz
2015-09-29 8:39 ` Kevin Wolf
2015-09-29 14:47 ` [Qemu-devel] " Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150928150752.GC18068@noname.str.redhat.com \
--to=kwolf@redhat.com \
--cc=jcody@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.