From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:49855) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZgqHS-0004nw-It for qemu-devel@nongnu.org; Tue, 29 Sep 2015 04:28:59 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZgqHR-0005BW-3F for qemu-devel@nongnu.org; Tue, 29 Sep 2015 04:28:58 -0400 Date: Tue, 29 Sep 2015 10:28:47 +0200 From: Kevin Wolf Message-ID: <20150929082847.GB3930@noname.str.redhat.com> References: <20150928150752.GC18068@noname.str.redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jeff Cody Cc: jcody@redhat.com, stefanha@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org Am 28.09.2015 um 23:57 hat Jeff Cody geschrieben: > On Sep 28, 2015 5:31 PM, "Kevin Wolf" wrote: > > >=20 > (Responding from mobile phone again) >=20 > > Am 28.09.2015 um 05:29 hat Jeff Cody geschrieben: > > > During mirror, if the target device does not have support zero > > > initialization, a mirror may result in a corrupt image. > > > > I think you want to check this sentence. ("During mirror [...], a > > mirror may result [...]") > > >=20 > Yes, thanks. >=20 > > > For instance, on mirror to a host device with format =3D raw, whate= ver > > > random data is on the target device will still be there for unalloc= ated > > > sectors. > > > > > > This is because during the mirror, we set the dirty bitmap to copy = only > > > sectors allocated above 'base'.=A0 In the case of target devices wh= ere we > > > cannot assume unallocated sectors will be read as zeroes, we need t= o > > > explicitely zero out this data. > > > > > > In order to avoid zeroing out all sectors of the target device prio= r to > > > mirroring, we do zeroing as part of the block job.=A0 A second dirt= y > > > bitmap cache is created, to track sectors that are unallocated abov= e > > > 'base'.=A0 These sectors are then checked for status of BDRV_BLOCK_= ZERO > > > on the target - if they are not, then zeroes are explicitly written= . > > > > Why do you need a bitmap? You never change the bitmap after initialis= ing > > it, so couldn't you instead just check the allocation status when you > > need it? >=20 > The main reason was really to maximize code reuse, and be able to use t= he same > iteration code in the mirror coroutine. >=20 > > > > In fact, why do we need two passes? I would have expected that commit > > dcfb3beb already does the trick, with checking allocation status and > > writing zeroes during the normal single pass. > > > > If that commit fails to solve the problem, I guess I first need to > > understand why before I can continue reviewing this one... > > >=20 > Responding from memory right now, but that commit only helps if the gue= st > unmaps data, changing the sectors to unallocated after the mirror begin= s. >=20 > However, before we get to this point we've already generated our bitmap= of > dirty sectors in mirror_run(), and those are explicitly only sectors th= at are > allocated above the source.=A0 Inside the iteration, we'll only pick up= the > unallocated sectors if they have been changed by the guest. So the problem is just that the sectors aren't included in the initialisation of the dirty bitmap? If so, why do we need a second bitmap and can't basically or the zero bitmap into the normal one? Is the real fix that for sync =3D=3D full, we need to set the whole bitma= p initially, regardless of the allocation status? After all, the initialisation of the bitmap is how the sync modes are defined in the QAPI documentation. Hm... Of course, you can only rely on zero initialisation as long as nobody (including an earlier mirror iteration) has written to the target before, so if you want to keep the target sparse if possible, you do need a second bitmap; a normal dirty bitmap for the target would be enough though. With this, you could probably implement something very similar to convert_write() in qemu-img. Unifying qemu-img convert and mirroring seems to be worthwhile in the long run anyway. Kevin > > > This only occurs under two conditions: > > > > > >=A0 =A0 =A01. 'mode' !=3D "existing" > > >=A0 =A0 =A02. bdrv_has_zero_init(target) =3D=3D NULL > > > > > > We perform the mirroring through mirror_iteration() as before, exce= pt > > > in two passes.=A0 If the above two conditions are met, the first pa= ss > > > is using the bitmap tracking unallocated sectors, to write the need= ed > > > zeroes.=A0 Then, the second pass is performed, to mirror the actual= data > > > as before. > > > > > > If the above two conditions are not met, then the first pass is ski= pped, > > > and only the second pass (the one with the actual data) is performe= d. > > > > > > Signed-off-by: Jeff Cody > > > --- > > >=A0 block/mirror.c=A0 =A0 =A0 =A0 =A0 =A0 | 109 > ++++++++++++++++++++++++++++++++++------------ > > >=A0 blockdev.c=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 |=A0 =A02 +- > > >=A0 include/block/block_int.h |=A0 =A03 +- > > >=A0 qapi/block-core.json=A0 =A0 =A0 |=A0 =A06 ++- > > >=A0 4 files changed, 87 insertions(+), 33 deletions(-) > > > > > > diff --git a/block/mirror.c b/block/mirror.c > > > index 405e5c4..b599176 100644 > > > --- a/block/mirror.c > > > +++ b/block/mirror.c > > > @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob { > > >=A0 =A0 =A0 int64_t bdev_length; > > >=A0 =A0 =A0 unsigned long *cow_bitmap; > > >=A0 =A0 =A0 BdrvDirtyBitmap *dirty_bitmap; > > > -=A0 =A0 HBitmapIter hbi; > > > +=A0 =A0 HBitmapIter zero_hbi; > > > +=A0 =A0 HBitmapIter allocated_hbi; > > > +=A0 =A0 HBitmapIter *hbi; > > >=A0 =A0 =A0 uint8_t *buf; > > >=A0 =A0 =A0 QSIMPLEQ_HEAD(, MirrorBuffer) buf_free; > > >=A0 =A0 =A0 int buf_free_count; > > > @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob { > > >=A0 =A0 =A0 int sectors_in_flight; > > >=A0 =A0 =A0 int ret; > > >=A0 =A0 =A0 bool unmap; > > > +=A0 =A0 bool zero_unallocated; > > > +=A0 =A0 bool zero_cycle; > > >=A0 =A0 =A0 bool waiting_for_io; > > >=A0 } MirrorBlockJob; > > > > > > @@ -166,10 +170,10 @@ static uint64_t coroutine_fn mirror_iteration > (MirrorBlockJob *s) > > >=A0 =A0 =A0 int pnum; > > >=A0 =A0 =A0 int64_t ret; > > > > > > -=A0 =A0 s->sector_num =3D hbitmap_iter_next(&s->hbi); > > > +=A0 =A0 s->sector_num =3D hbitmap_iter_next(s->hbi); > > >=A0 =A0 =A0 if (s->sector_num < 0) { > > > -=A0 =A0 =A0 =A0 bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi); > > > -=A0 =A0 =A0 =A0 s->sector_num =3D hbitmap_iter_next(&s->hbi); > > > +=A0 =A0 =A0 =A0 bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi); > > > +=A0 =A0 =A0 =A0 s->sector_num =3D hbitmap_iter_next(s->hbi); > > >=A0 =A0 =A0 =A0 =A0 trace_mirror_restart_iter(s, bdrv_get_dirty_coun= t(s-> > dirty_bitmap)); > > >=A0 =A0 =A0 =A0 =A0 assert(s->sector_num >=3D 0); > > >=A0 =A0 =A0 } > > > @@ -287,7 +291,7 @@ static uint64_t coroutine_fn mirror_iteration > (MirrorBlockJob *s) > > >=A0 =A0 =A0 =A0 =A0 =A0*/ > > >=A0 =A0 =A0 =A0 =A0 if (next_sector > hbitmap_next_sector > > >=A0 =A0 =A0 =A0 =A0 =A0 =A0 && bdrv_get_dirty(source, s->dirty_bitma= p, next_sector)) { > > > -=A0 =A0 =A0 =A0 =A0 =A0 hbitmap_next_sector =3D hbitmap_iter_next(= &s->hbi); > > > +=A0 =A0 =A0 =A0 =A0 =A0 hbitmap_next_sector =3D hbitmap_iter_next(= s->hbi); > > >=A0 =A0 =A0 =A0 =A0 } > > > > > >=A0 =A0 =A0 =A0 =A0 next_sector +=3D sectors_per_chunk; > > > @@ -300,25 +304,34 @@ static uint64_t coroutine_fn mirror_iteration > (MirrorBlockJob *s) > > >=A0 =A0 =A0 s->sectors_in_flight +=3D nb_sectors; > > >=A0 =A0 =A0 trace_mirror_one_iteration(s, sector_num, nb_sectors); > > > > > > -=A0 =A0 ret =3D bdrv_get_block_status_above(source, NULL, sector_n= um, > > > -=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 nb_sectors, &pnum); > > > -=A0 =A0 if (ret < 0 || pnum < nb_sectors || > > > -=A0 =A0 =A0 =A0 =A0 =A0 (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLO= CK_ZERO))) { > > > -=A0 =A0 =A0 =A0 bdrv_aio_readv(source, sector_num, &op->qiov, nb_s= ectors, > > > -=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mirror_read_complet= e, op); > > > -=A0 =A0 } else if (ret & BDRV_BLOCK_ZERO) { > > > -=A0 =A0 =A0 =A0 bdrv_aio_write_zeroes(s->target, sector_num, op->n= b_sectors, > > > -=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 s->unm= ap ? BDRV_REQ_MAY_UNMAP : 0, > > > -=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 mirror= _write_complete, op); > > > +=A0 =A0 if (s->zero_cycle) { > > > +=A0 =A0 =A0 =A0 ret =3D bdrv_get_block_status(s->target, sector_nu= m, nb_sectors, & > pnum); > > > +=A0 =A0 =A0 =A0 if (!(ret & BDRV_BLOCK_ZERO)) { > > > +=A0 =A0 =A0 =A0 =A0 =A0 bdrv_aio_write_zeroes(s->target, sector_nu= m, op->nb_sectors, > > > +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > > > +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= mirror_write_complete, op); > > > +=A0 =A0 =A0 =A0 } > > > > It seems to be expected that this function always involves an AIO > > request and the completion event is what helps making progress. For t= he > > BDRV_BLOCK_ZERO case, we don't do that however. I'm not sure what > > exactly this means, but at least I think we are applying block job > > throttling to doing nothing with some areas of the image. > > > > >=A0 =A0 =A0 } else { > > > -=A0 =A0 =A0 =A0 assert(!(ret & BDRV_BLOCK_DATA)); > > > -=A0 =A0 =A0 =A0 bdrv_aio_discard(s->target, sector_num, op->nb_sec= tors, > > > -=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mirror_write_co= mplete, op); > > > +=A0 =A0 =A0 =A0 ret =3D bdrv_get_block_status_above(source, NULL, = sector_num, > > > +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 nb_sectors, &pnum); > > > +=A0 =A0 =A0 =A0 if (ret < 0 || pnum < nb_sectors || > > > +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 (ret & BDRV_BLOCK_DATA && !(ret & = BDRV_BLOCK_ZERO))) { > > > +=A0 =A0 =A0 =A0 =A0 =A0 bdrv_aio_readv(source, sector_num, &op->qi= ov, nb_sectors, > > > +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mirror_read= _complete, op); > > > +=A0 =A0 =A0 =A0 } else if (ret & BDRV_BLOCK_ZERO) { > > > +=A0 =A0 =A0 =A0 =A0 =A0 bdrv_aio_write_zeroes(s->target, sector_nu= m, op->nb_sectors, > > > +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > > > +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= mirror_write_complete, op); > > > +=A0 =A0 =A0 =A0 } else { > > > +=A0 =A0 =A0 =A0 =A0 =A0 assert(!(ret & BDRV_BLOCK_DATA)); > > > +=A0 =A0 =A0 =A0 =A0 =A0 bdrv_aio_discard(s->target, sector_num, op= ->nb_sectors, > > > +=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mirror_= write_complete, op); > > > +=A0 =A0 =A0 =A0 } > > >=A0 =A0 =A0 } > > >=A0 =A0 =A0 return delay_ns; > > >=A0 } > > > > Kevin > > >=20