From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39574) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZgcHi-0007zP-2d for qemu-devel@nongnu.org; Mon, 28 Sep 2015 13:32:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZgcHd-0005Bh-2b for qemu-devel@nongnu.org; Mon, 28 Sep 2015 13:32:18 -0400 References: From: Max Reitz Message-ID: <56097990.7040007@redhat.com> Date: Mon, 28 Sep 2015 19:32:00 +0200 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="xo61w6a5vd8jh8gGvGPi8LbfEAAEf8lKr" Subject: Re: [Qemu-devel] [Qemu-block] [PATCH 3/3] block: mirror - zero unallocated target sectors when zero init not present List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Jeff Cody , qemu-devel@nongnu.org Cc: kwolf@redhat.com, qemu-block@nongnu.org, stefanha@redhat.com This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --xo61w6a5vd8jh8gGvGPi8LbfEAAEf8lKr Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: quoted-printable On 28.09.2015 05:29, Jeff Cody wrote: > During mirror, if the target device does not have support zero > initialization, a mirror may result in a corrupt image. >=20 > For instance, on mirror to a host device with format =3D raw, whatever > random data is on the target device will still be there for unallocated= > sectors. >=20 > This is because during the mirror, we set the dirty bitmap to copy only= > sectors allocated above 'base'. In the case of target devices where we= > cannot assume unallocated sectors will be read as zeroes, we need to > explicitely zero out this data. >=20 > In order to avoid zeroing out all sectors of the target device prior to= > mirroring, we do zeroing as part of the block job. A second dirty > bitmap cache is created, to track sectors that are unallocated above > 'base'. These sectors are then checked for status of BDRV_BLOCK_ZERO > on the target - if they are not, then zeroes are explicitly written. >=20 > This only occurs under two conditions: >=20 > 1. 'mode' !=3D "existing" > 2. bdrv_has_zero_init(target) =3D=3D NULL >=20 > We perform the mirroring through mirror_iteration() as before, except > in two passes. If the above two conditions are met, the first pass > is using the bitmap tracking unallocated sectors, to write the needed > zeroes. Then, the second pass is performed, to mirror the actual data > as before. >=20 > If the above two conditions are not met, then the first pass is skipped= , > and only the second pass (the one with the actual data) is performed. >=20 > Signed-off-by: Jeff Cody > --- > block/mirror.c | 109 ++++++++++++++++++++++++++++++++++----= -------- > blockdev.c | 2 +- > include/block/block_int.h | 3 +- > qapi/block-core.json | 6 ++- > 4 files changed, 87 insertions(+), 33 deletions(-) >=20 > diff --git a/block/mirror.c b/block/mirror.c > index 405e5c4..b599176 100644 > --- a/block/mirror.c > +++ b/block/mirror.c > @@ -50,7 +50,9 @@ typedef struct MirrorBlockJob { > int64_t bdev_length; > unsigned long *cow_bitmap; > BdrvDirtyBitmap *dirty_bitmap; > - HBitmapIter hbi; > + HBitmapIter zero_hbi; > + HBitmapIter allocated_hbi; > + HBitmapIter *hbi; > uint8_t *buf; > QSIMPLEQ_HEAD(, MirrorBuffer) buf_free; > int buf_free_count; > @@ -60,6 +62,8 @@ typedef struct MirrorBlockJob { > int sectors_in_flight; > int ret; > bool unmap; > + bool zero_unallocated; > + bool zero_cycle; > bool waiting_for_io; > } MirrorBlockJob; > =20 > @@ -166,10 +170,10 @@ static uint64_t coroutine_fn mirror_iteration(Mir= rorBlockJob *s) > int pnum; > int64_t ret; > =20 > - s->sector_num =3D hbitmap_iter_next(&s->hbi); > + s->sector_num =3D hbitmap_iter_next(s->hbi); > if (s->sector_num < 0) { > - bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi); > - s->sector_num =3D hbitmap_iter_next(&s->hbi); > + bdrv_dirty_iter_init(s->dirty_bitmap, s->hbi); > + s->sector_num =3D hbitmap_iter_next(s->hbi); > trace_mirror_restart_iter(s, bdrv_get_dirty_count(s->dirty_bit= map)); > assert(s->sector_num >=3D 0); > } > @@ -287,7 +291,7 @@ static uint64_t coroutine_fn mirror_iteration(Mirro= rBlockJob *s) > */ > if (next_sector > hbitmap_next_sector > && bdrv_get_dirty(source, s->dirty_bitmap, next_sector)) {= > - hbitmap_next_sector =3D hbitmap_iter_next(&s->hbi); > + hbitmap_next_sector =3D hbitmap_iter_next(s->hbi); > } > =20 > next_sector +=3D sectors_per_chunk; > @@ -300,25 +304,34 @@ static uint64_t coroutine_fn mirror_iteration(Mir= rorBlockJob *s) > s->sectors_in_flight +=3D nb_sectors; > trace_mirror_one_iteration(s, sector_num, nb_sectors); > =20 > - ret =3D bdrv_get_block_status_above(source, NULL, sector_num, > - nb_sectors, &pnum); > - if (ret < 0 || pnum < nb_sectors || > - (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) { > - bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors, > - mirror_read_complete, op); > - } else if (ret & BDRV_BLOCK_ZERO) { > - bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sectors, > - s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > - mirror_write_complete, op); > + if (s->zero_cycle) { > + ret =3D bdrv_get_block_status(s->target, sector_num, nb_sector= s, &pnum); > + if (!(ret & BDRV_BLOCK_ZERO)) { > + bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sector= s, > + s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > + mirror_write_complete, op); > + } > } else { > - assert(!(ret & BDRV_BLOCK_DATA)); > - bdrv_aio_discard(s->target, sector_num, op->nb_sectors, > - mirror_write_complete, op); > + ret =3D bdrv_get_block_status_above(source, NULL, sector_num, > + nb_sectors, &pnum); > + if (ret < 0 || pnum < nb_sectors || > + (ret & BDRV_BLOCK_DATA && !(ret & BDRV_BLOCK_ZERO))) {= > + bdrv_aio_readv(source, sector_num, &op->qiov, nb_sectors, > + mirror_read_complete, op); > + } else if (ret & BDRV_BLOCK_ZERO) { > + bdrv_aio_write_zeroes(s->target, sector_num, op->nb_sector= s, > + s->unmap ? BDRV_REQ_MAY_UNMAP : 0, > + mirror_write_complete, op); > + } else { > + assert(!(ret & BDRV_BLOCK_DATA)); > + bdrv_aio_discard(s->target, sector_num, op->nb_sectors, > + mirror_write_complete, op); > + } > } > return delay_ns; > } > =20 > -static int mirror_do_iteration(MirrorBlockJob *s, uint64_t last_pause_= ns) > +static int mirror_do_iteration(MirrorBlockJob *s, uint64_t *last_pause= _ns) > { > int ret; > =20 > @@ -347,7 +360,7 @@ static int mirror_do_iteration(MirrorBlockJob *s, u= int64_t last_pause_ns) > * We do so every SLICE_TIME nanoseconds, or when there is an = error, > * or when the source is clean, whichever comes first. > */ > - if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - last_pause_ns < S= LICE_TIME > + if (qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - *last_pause_ns < = SLICE_TIME > && s->common.iostatus =3D=3D BLOCK_DEVICE_IO_STATUS_OK) { > if (s->in_flight =3D=3D MAX_IN_FLIGHT || s->buf_free_count= =3D=3D 0 || > (cnt =3D=3D 0 && s->in_flight > 0)) { > @@ -371,6 +384,14 @@ static int mirror_do_iteration(MirrorBlockJob *s, = uint64_t last_pause_ns) > goto immediate_exit; > } > } else { > + > + if (s->zero_cycle) { > + /* this is not the end of the streaming cycle, > + * if we are just filling in zeroes for unallocate= d > + * sectors prior to streaming the real data */ > + goto immediate_exit; > + } > + > /* We're out of the streaming phase. From now on, if = the job > * is cancelled we will actually complete all pending = I/O and > * report completion. This way, block-job-cancel will= leave > @@ -419,7 +440,7 @@ static int mirror_do_iteration(MirrorBlockJob *s, u= int64_t last_pause_ns) > s->common.cancelled =3D false; > break; > } > - last_pause_ns =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > + *last_pause_ns =3D qemu_clock_get_ns(QEMU_CLOCK_REALTIME); > } > =20 > immediate_exit: > @@ -511,6 +532,15 @@ static void coroutine_fn mirror_run(void *opaque) > checking for a NULL string */ > int ret =3D 0; > int n; > + BdrvDirtyBitmap *zero_dirty_bitmap; > + BdrvDirtyBitmap *allocated_dirty_bitmap =3D s->dirty_bitmap; > + > + zero_dirty_bitmap =3D bdrv_create_dirty_bitmap(s->target, > + s->granularity, NULL,= true, > + NULL); > + if (zero_dirty_bitmap =3D=3D NULL) { > + goto immediate_exit; > + } I think I'd like the error to be reported to the user; but in any case, you have to set ret to a negative value. > =20 > if (block_job_is_cancelled(&s->common)) { > goto immediate_exit; > @@ -588,14 +618,33 @@ static void coroutine_fn mirror_run(void *opaque)= > assert(n > 0); > if (ret =3D=3D 1) { > bdrv_set_dirty_bitmap(s->dirty_bitmap, sector_num, n);= > + } else if (s->zero_unallocated) { > + bdrv_set_dirty_bitmap(zero_dirty_bitmap, sector_num, n= ); > } > sector_num +=3D n; > } > } > =20 > - bdrv_dirty_iter_init(s->dirty_bitmap, &s->hbi); > + bdrv_dirty_iter_init(s->dirty_bitmap, &s->allocated_hbi); > =20 > - ret =3D mirror_do_iteration(s, last_pause_ns); > + if (s->zero_unallocated) { > + bdrv_dirty_iter_init(zero_dirty_bitmap, &s->zero_hbi); > + s->dirty_bitmap =3D zero_dirty_bitmap; > + s->hbi =3D &s->zero_hbi; > + > + s->zero_cycle =3D true; > + ret =3D mirror_do_iteration(s, &last_pause_ns); > + if (ret < 0) { > + goto immediate_exit; > + } > + > + mirror_drain(s); > + s->zero_cycle =3D false; > + } > + > + s->dirty_bitmap =3D allocated_dirty_bitmap; > + s->hbi =3D &s->allocated_hbi; > + ret =3D mirror_do_iteration(s, &last_pause_ns); > =20 > immediate_exit: > if (s->in_flight > 0) { > @@ -611,7 +660,8 @@ immediate_exit: > qemu_vfree(s->buf); > g_free(s->cow_bitmap); > g_free(s->in_flight_bitmap); > - bdrv_release_dirty_bitmap(bs, s->dirty_bitmap); > + bdrv_release_dirty_bitmap(bs, allocated_dirty_bitmap); > + bdrv_release_dirty_bitmap(NULL, zero_dirty_bitmap); > bdrv_iostatus_disable(s->target); > =20 > data =3D g_malloc(sizeof(*data)); > @@ -702,7 +752,7 @@ static void mirror_start_job(BlockDriverState *bs, = BlockDriverState *target, > int64_t buf_size, > BlockdevOnError on_source_error, > BlockdevOnError on_target_error, > - bool unmap, > + bool unmap, bool existing, > BlockCompletionFunc *cb, > void *opaque, Error **errp, > const BlockJobDriver *driver, > @@ -737,6 +787,7 @@ static void mirror_start_job(BlockDriverState *bs, = BlockDriverState *target, > return; > } > =20 > + s->zero_unallocated =3D !existing && !bdrv_has_zero_init(target); I think this should be set only if we're doing a full mirror operation. For instance, I could do a none, top or incremental mirror to a new qcow2 file, which would give it a backing file, obviously. You're lucky in that qcow2 claims to always have zero initialization, when this is in fact not true (someone's ought to fix that...): With a backing file, an overlay file just cannot have zero initialization, it's impossible (well, unless the backing file is completely zero). So if qcow2 were to answer correctly, i.e. "No, with a backing file I do not have zero init", then this would overwrite all sectors which are supposed to be unallocated because they are present in the backing file. > s->replaces =3D g_strdup(replaces); > s->on_source_error =3D on_source_error; > s->on_target_error =3D on_target_error; > @@ -767,7 +818,7 @@ void mirror_start(BlockDriverState *bs, BlockDriver= State *target, > int64_t speed, uint32_t granularity, int64_t buf_siz= e, > MirrorSyncMode mode, BlockdevOnError on_source_error= , > BlockdevOnError on_target_error, > - bool unmap, > + bool unmap, bool existing, > BlockCompletionFunc *cb, > void *opaque, Error **errp) > { > @@ -782,8 +833,8 @@ void mirror_start(BlockDriverState *bs, BlockDriver= State *target, > base =3D mode =3D=3D MIRROR_SYNC_MODE_TOP ? bs->backing_hd : NULL;= > mirror_start_job(bs, target, replaces, > speed, granularity, buf_size, > - on_source_error, on_target_error, unmap, cb, opaq= ue, errp, > - &mirror_job_driver, is_none_mode, base); > + on_source_error, on_target_error, unmap, existing= , cb, > + opaque, errp, &mirror_job_driver, is_none_mode, b= ase); > } > =20 > void commit_active_start(BlockDriverState *bs, BlockDriverState *base,= > @@ -830,7 +881,7 @@ void commit_active_start(BlockDriverState *bs, Bloc= kDriverState *base, > =20 > bdrv_ref(base); > mirror_start_job(bs, base, NULL, speed, 0, 0, > - on_error, on_error, false, cb, opaque, &local_err= , > + on_error, on_error, false, false, cb, opaque, &lo= cal_err, This should probably be true; the commit target is already existing, after all. Also, without it being true, iotest 097 fails. > &commit_active_job_driver, false, base); > if (local_err) { > error_propagate(errp, local_err); > diff --git a/blockdev.c b/blockdev.c > index cb9f78d..c06ac60 100644 > --- a/blockdev.c > +++ b/blockdev.c > @@ -2816,7 +2816,7 @@ void qmp_drive_mirror(const char *device, const c= har *target, > has_replaces ? replaces : NULL, > speed, granularity, buf_size, sync, > on_source_error, on_target_error, > - unmap, > + unmap, mode =3D=3D NEW_IMAGE_MODE_EXISTING, > block_job_cb, bs, &local_err); > if (local_err !=3D NULL) { > bdrv_unref(target_bs); > diff --git a/include/block/block_int.h b/include/block/block_int.h > index 14ad4c3..21a8988 100644 > --- a/include/block/block_int.h > +++ b/include/block/block_int.h > @@ -614,6 +614,7 @@ void commit_active_start(BlockDriverState *bs, Bloc= kDriverState *base, > * @on_source_error: The action to take upon error reading from the so= urce. > * @on_target_error: The action to take upon error writing to the targ= et. > * @unmap: Whether to unmap target where source sectors only contain z= eroes. > + * @existing: Whether target image is an existing image prior to the Q= MP cmd. > * @cb: Completion function for the job. > * @opaque: Opaque pointer value passed to @cb. > * @errp: Error object. > @@ -628,7 +629,7 @@ void mirror_start(BlockDriverState *bs, BlockDriver= State *target, > int64_t speed, uint32_t granularity, int64_t buf_siz= e, > MirrorSyncMode mode, BlockdevOnError on_source_error= , > BlockdevOnError on_target_error, > - bool unmap, > + bool unmap, bool existing, > BlockCompletionFunc *cb, > void *opaque, Error **errp); > =20 > diff --git a/qapi/block-core.json b/qapi/block-core.json > index bb2189e..033afb4 100644 > --- a/qapi/block-core.json > +++ b/qapi/block-core.json > @@ -952,8 +952,10 @@ > # broken Quorum files. (Since 2.1) > # > # @mode: #optional whether and how QEMU should create a new image, def= ault is > -# 'absolute-paths'. > -# This empty line should stay. > +# 'absolute-paths'. If mode !=3D 'existing', and the target do= es not > +# have zero init (sparseness), then the target image will have= sectors > +# zeroed out that correspond to sectors in an unallocated stat= e in the > +# source image. As I said above, this should only happen if @sync =3D=3D 'full'. Max > # @speed: #optional the maximum speed, in bytes per second > # > # @sync: what parts of the disk image should be copied to the destinat= ion >=20 --xo61w6a5vd8jh8gGvGPi8LbfEAAEf8lKr Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQEcBAEBCAAGBQJWCXmQAAoJEDuxQgLoOKytoG4IAJ5wBo9NIHFQJUqjz/HL8h6u eucRVrEg7LSUZRPXl3SzfBokydbGYxpEjmfs/y7xAOsw4yPv3bVp3/MvI3rngdfu KvrjVqV7bY2D1OTnNVLD6qyzDD0QTGOGpoCminMR/D6qnLiS1QTTknmbRJ+ciutH kYXUcLhz1l9LahWIAuXUInIrOkqh3MtSn2jBszibVSlmpcNl6LlV+2g1G5yRNd+D b9Bt8AW4bwv7InOrXtpxxyx9N/7KcnvohQm2br164pYuPlAezAVkw55KOADZLw7m Pz4nzhpi+ID4qzfciIWPPdGtSClH0H2IIUQgW15kYNv9BPi67porXBoi5jtq41E= =ktP2 -----END PGP SIGNATURE----- --xo61w6a5vd8jh8gGvGPi8LbfEAAEf8lKr--