From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:39635) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bDA1x-00036y-6u for qemu-devel@nongnu.org; Wed, 15 Jun 2016 08:34:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bDA1u-0006uV-LN for qemu-devel@nongnu.org; Wed, 15 Jun 2016 08:34:48 -0400 References: <1465917916-22348-1-git-send-email-den@openvz.org> <1465917916-22348-5-git-send-email-den@openvz.org> <5760C4CB.6080703@redhat.com> <576115F9.8020109@openvz.org> From: Eric Blake Message-ID: <57614B5A.4030902@redhat.com> Date: Wed, 15 Jun 2016 06:34:34 -0600 MIME-Version: 1.0 In-Reply-To: <576115F9.8020109@openvz.org> Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="eCKNoDo5jXxt0u0RitrpccmUrv1LE9MnS" Subject: Re: [Qemu-devel] [PATCH 4/9] mirror: efficiently zero out target List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Denis V. Lunev" , qemu-devel@nongnu.org, qemu-block@nongnu.org Cc: vsementsov@virtuozzo.com, Stefan Hajnoczi , Fam Zheng , Kevin Wolf , Max Reitz , Jeff Cody This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --eCKNoDo5jXxt0u0RitrpccmUrv1LE9MnS Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 06/15/2016 02:46 AM, Denis V. Lunev wrote: > On 06/15/2016 06:00 AM, Eric Blake wrote: >> On 06/14/2016 09:25 AM, Denis V. Lunev wrote: >>> With a bdrv_co_write_zeroes method on a target BDS zeroes will not be= >>> placed >>> into the wire. Thus the target could be very efficiently zeroed out. >>> This >>> is should be done with the largest chunk possible. >>> >> Probably nicer to track this in bytes. And do you really want a >> hard-coded arbitrary limit, or is it better to live with >> MIN_NON_ZERO(target_bs->bl.max_pwrite_zeroes, INT_MAX)? > unfortunately we should. INT_MAX is not aligned as required. > May be we should align INT_MAX properly to fullfill > write_zeroes alignment. >=20 > Hmm, may be we can align INT_MAX properly down. OK, > I'll try to do that gracefully. It's fairly easy to round a max_transfer or max_pwrite_zeroes down to an aligned value; we already have code in io.c that does that in bdrv_co_do_pwrite_zeroes(). >=20 >>> @@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s) >>> end =3D s->bdev_length / BDRV_SECTOR_SIZE; >>> - if (base =3D=3D NULL && !bdrv_has_zero_init(target_bs)) { >>> + if (base =3D=3D NULL && !bdrv_has_zero_init(target_bs) && >>> + target_bs->drv->bdrv_co_write_zeroes =3D=3D NULL) { >> Indentation is off, although if checkpatch.pl doesn't complain I guess= >> it doesn't matter that much. >> >> Why should you care whether the target_bs->drv implements a callback? >> Can't you just rely on the normal bdrv_*() functions to do the dirty >> work of picking the most efficient implementation without you having t= o >> bypass the block layer? In fact, isn't that the whole goal of >> bdrv_make_zero() - why not call that instead of reimplementing it? > this is the idea of the patch actually. If the callback is not > implemented, we > will have zeroes actually written or send to the wire. In this case > there is > not much sense to do that, the amount of data actually written will be > significantly increased (some areas will be written twice - with zeroes= and > with the actual data). > But that's where bdrv_can_write_zeroes_with_unmap() comes in handy - you can use the public interface to learn whether bdrv_make_zero() will be efficient or not, without having to probe what the backend supports. > On the other hand, if callback is implemented, we will have very small > amount > of data in the wire and written actually and thus will have a benefit. = I am > trying to avoid very small chunks of data. Here (during the migration > process) > the data is sent with 10 Mb chunks and with takes a LOT of time with NB= D. > We can send chunks 1.5 Gb (currently). They occupies the same 26 bytes > of data > on the transport layer. I agree that we don't want to pre-initialize the device to zero unless write zeroes is an efficient operation, but I don't think that the existence of bs->drv->bdrv_co_[p]write_zeroes is the right way to find that out. I also think that we need to push harder on the NBD list that under the new block limits proposal, we WANT to be able to advertise when the new NBD_CMD_WRITE_ZEROES command will accept a larger size than NBD_CMD_WRITE (as currently written, the BLOCK_INFO extension proposal states that if a server advertises a max transaction size to the client, then the client must honor that size for all commands including NBD_CMD_WRITE_ZEROES, which would mean your 1.5G request [or my proposed 2G - 4k request] is invalid and would have to be a bunch of 32M requests)= =2E https://sourceforge.net/p/nbd/mailman/message/35081223/ --=20 Eric Blake eblake redhat com +1-919-301-3266 Libvirt virtualization library http://libvirt.org --eCKNoDo5jXxt0u0RitrpccmUrv1LE9MnS Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 Comment: Public key at http://people.redhat.com/eblake/eblake.gpg Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBCAAGBQJXYUtaAAoJEKeha0olJ0NqfW4IAIs8oP3egMw9gvW+7+zCkfz1 VkraAuqHioO9r9WABv3RlLvYGVXi7ToMlmBsFOavj2i50SNdcA0D0K9id69L/Le1 i/gIPTijgSDQWbLn+fAvMlvJG1OPcYbJ4DKBKRwzHUoiFcxt4pvaglXKnz/WSMUt PD39jy5zcL5dUn46hTYv4q+6GaEI0LPj+9xighbzxcqELsaDOxW2URkhO77MoXce hd129f7HrKeSToyuLokuHTYxRGfyfNxA2AjAkoscNvjjYS2UISnuSC0aMFnriXyV udk1CRNP9hbFcmAEH/4MrP57CNy6EpGHqM9Xc6pUiV3h/os+x36nhZTWry4L3C8= =OZdo -----END PGP SIGNATURE----- --eCKNoDo5jXxt0u0RitrpccmUrv1LE9MnS--