From: Eric Blake <eblake@redhat.com>
To: "Denis V. Lunev" <den@openvz.org>,
qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: vsementsov@virtuozzo.com, Stefan Hajnoczi <stefanha@redhat.com>,
Fam Zheng <famz@redhat.com>, Kevin Wolf <kwolf@redhat.com>,
Max Reitz <mreitz@redhat.com>, Jeff Cody <jcody@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 4/9] mirror: efficiently zero out target
Date: Wed, 15 Jun 2016 06:34:34 -0600 [thread overview]
Message-ID: <57614B5A.4030902@redhat.com> (raw)
In-Reply-To: <576115F9.8020109@openvz.org>
[-- Attachment #1: Type: text/plain, Size: 3746 bytes --]
On 06/15/2016 02:46 AM, Denis V. Lunev wrote:
> On 06/15/2016 06:00 AM, Eric Blake wrote:
>> On 06/14/2016 09:25 AM, Denis V. Lunev wrote:
>>> With a bdrv_co_write_zeroes method on a target BDS zeroes will not be
>>> placed
>>> into the wire. Thus the target could be very efficiently zeroed out.
>>> This
>>> is should be done with the largest chunk possible.
>>>
>> Probably nicer to track this in bytes. And do you really want a
>> hard-coded arbitrary limit, or is it better to live with
>> MIN_NON_ZERO(target_bs->bl.max_pwrite_zeroes, INT_MAX)?
> unfortunately we should. INT_MAX is not aligned as required.
> May be we should align INT_MAX properly to fullfill
> write_zeroes alignment.
>
> Hmm, may be we can align INT_MAX properly down. OK,
> I'll try to do that gracefully.
It's fairly easy to round a max_transfer or max_pwrite_zeroes down to an
aligned value; we already have code in io.c that does that in
bdrv_co_do_pwrite_zeroes().
>
>>> @@ -512,7 +513,8 @@ static int mirror_dirty_init(MirrorBlockJob *s)
>>> end = s->bdev_length / BDRV_SECTOR_SIZE;
>>> - if (base == NULL && !bdrv_has_zero_init(target_bs)) {
>>> + if (base == NULL && !bdrv_has_zero_init(target_bs) &&
>>> + target_bs->drv->bdrv_co_write_zeroes == NULL) {
>> Indentation is off, although if checkpatch.pl doesn't complain I guess
>> it doesn't matter that much.
>>
>> Why should you care whether the target_bs->drv implements a callback?
>> Can't you just rely on the normal bdrv_*() functions to do the dirty
>> work of picking the most efficient implementation without you having to
>> bypass the block layer? In fact, isn't that the whole goal of
>> bdrv_make_zero() - why not call that instead of reimplementing it?
> this is the idea of the patch actually. If the callback is not
> implemented, we
> will have zeroes actually written or send to the wire. In this case
> there is
> not much sense to do that, the amount of data actually written will be
> significantly increased (some areas will be written twice - with zeroes and
> with the actual data).
>
But that's where bdrv_can_write_zeroes_with_unmap() comes in handy - you
can use the public interface to learn whether bdrv_make_zero() will be
efficient or not, without having to probe what the backend supports.
> On the other hand, if callback is implemented, we will have very small
> amount
> of data in the wire and written actually and thus will have a benefit. I am
> trying to avoid very small chunks of data. Here (during the migration
> process)
> the data is sent with 10 Mb chunks and with takes a LOT of time with NBD.
> We can send chunks 1.5 Gb (currently). They occupies the same 26 bytes
> of data
> on the transport layer.
I agree that we don't want to pre-initialize the device to zero unless
write zeroes is an efficient operation, but I don't think that the
existence of bs->drv->bdrv_co_[p]write_zeroes is the right way to find
that out.
I also think that we need to push harder on the NBD list that under the
new block limits proposal, we WANT to be able to advertise when the new
NBD_CMD_WRITE_ZEROES command will accept a larger size than
NBD_CMD_WRITE (as currently written, the BLOCK_INFO extension proposal
states that if a server advertises a max transaction size to the client,
then the client must honor that size for all commands including
NBD_CMD_WRITE_ZEROES, which would mean your 1.5G request [or my proposed
2G - 4k request] is invalid and would have to be a bunch of 32M requests).
https://sourceforge.net/p/nbd/mailman/message/35081223/
--
Eric Blake eblake redhat com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 604 bytes --]
next prev parent reply other threads:[~2016-06-15 12:34 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-14 15:25 [Qemu-devel] [PATCH 0/9] major rework of drive-mirror Denis V. Lunev
2016-06-14 15:25 ` [Qemu-devel] [PATCH 1/9] mirror: fix calling of blk_aio_pwritev/blk_aio_preadv Denis V. Lunev
2016-06-14 22:48 ` Eric Blake
2016-06-14 15:25 ` [Qemu-devel] [PATCH 2/9] mirror: create mirror_dirty_init helper for mirror_run Denis V. Lunev
2016-06-15 2:29 ` Eric Blake
2016-06-14 15:25 ` [Qemu-devel] [PATCH 3/9] mirror: optimize dirty bitmap filling in mirror_run a bit Denis V. Lunev
2016-06-15 2:36 ` Eric Blake
2016-06-15 8:41 ` Denis V. Lunev
2016-06-15 12:25 ` Eric Blake
2016-06-14 15:25 ` [Qemu-devel] [PATCH 4/9] mirror: efficiently zero out target Denis V. Lunev
2016-06-15 3:00 ` Eric Blake
2016-06-15 8:46 ` Denis V. Lunev
2016-06-15 12:34 ` Eric Blake [this message]
2016-06-15 13:18 ` Denis V. Lunev
2016-07-06 14:33 ` Denis V. Lunev
2016-06-14 15:25 ` [Qemu-devel] [PATCH 5/9] mirror: improve performance of mirroring of empty disk Denis V. Lunev
2016-06-15 3:20 ` Eric Blake
2016-06-15 9:19 ` Stefan Hajnoczi
2016-06-15 10:37 ` Denis V. Lunev
2016-06-16 10:10 ` Stefan Hajnoczi
2016-06-17 2:53 ` Eric Blake
2016-06-17 13:56 ` Stefan Hajnoczi
2016-06-14 15:25 ` [Qemu-devel] [PATCH 6/9] block: pass qiov into before_write notifier Denis V. Lunev
2016-06-15 4:07 ` Eric Blake
2016-06-15 9:21 ` Stefan Hajnoczi
2016-06-15 9:24 ` Denis V. Lunev
2016-06-15 9:22 ` Stefan Hajnoczi
2016-06-14 15:25 ` [Qemu-devel] [PATCH 7/9] mirror: allow to save buffer for QEMUIOVector in MirrorOp Denis V. Lunev
2016-06-15 4:11 ` Eric Blake
2016-06-14 15:25 ` [Qemu-devel] [PATCH 8/9] mirror: use synch scheme for drive mirror Denis V. Lunev
2016-06-15 4:18 ` Eric Blake
2016-06-15 8:52 ` Denis V. Lunev
2016-06-15 9:48 ` Stefan Hajnoczi
2016-06-14 15:25 ` [Qemu-devel] [PATCH 9/9] mirror: replace bdrv_dirty_bitmap with plain hbitmap Denis V. Lunev
2016-06-15 9:06 ` [Qemu-devel] [PATCH 0/9] major rework of drive-mirror Kevin Wolf
2016-06-15 9:34 ` Denis V. Lunev
2016-06-15 10:25 ` Kevin Wolf
2016-06-15 10:44 ` Denis V. Lunev
2016-06-15 9:50 ` Stefan Hajnoczi
2016-06-15 11:09 ` Dr. David Alan Gilbert
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57614B5A.4030902@redhat.com \
--to=eblake@redhat.com \
--cc=den@openvz.org \
--cc=famz@redhat.com \
--cc=jcody@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.