All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: Wouter Verhelst <w@uter.be>, Eric Blake <eblake@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	qemu block <qemu-block@nongnu.org>,
	"nbd-general@lists.sourceforge.net"
	<nbd-general@lists.sourceforge.net>,
	Kevin Wolf <kwolf@redhat.com>, "Denis V. Lunev" <den@openvz.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Stefan stefanha@redhat. com" <stefanha@redhat.com>
Subject: Re: [Qemu-devel] write_zeroes/trim on the whole disk
Date: Sat, 24 Sep 2016 23:19:53 +0300	[thread overview]
Message-ID: <57E6DFE9.1070300@virtuozzo.com> (raw)
In-Reply-To: <6F90A726-42D5-4B71-ADA9-63740B5048AE@alex.org.uk>

On 24.09.2016 21:24, Alex Bligh wrote:
>> On 24 Sep 2016, at 18:47, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>>
>> I just wanted to say, that if we want a possibility of clearing the whole disk in one request for qcow2 we have to take 512 as granularity for such requests (with X = 9). An this is too small. 1tb will be the upper bound for the request.
> Sure. But I do not see the value in optimising these huge commands to run as single requests. If you want to do that, do it properly and have a negotiation-phase flag that supports 64 bit request lengths.

And add additional request type with another magic in first field and 
64bit length field? If such solution is appropriate for nbd it is ok for 
me of course. I've proposed something like this in first letter - 
"Increase length field of the request to 64bit". Changing existing 
request message type is wrong of course, but creating an additional one 
should be ok.

>
>> Full backup, for example:
>>
>> 1. target can do fast write_zeroes: clear the whole disk (great if we can do it in one request, without splitting, etc), then backup all data except zero or unallocated (save a lot of time on this skipping).
>> 2. target can not do fast write_zeroes: just backup all data. We need not clear the disk, as we will not save time by this.
>>
>> So here, we need not splitting as a general. Just clear all or not clearing at all.
> As I said, within the current protocol you cannot tell whether a target supports 'fast write zeroes', and indeed the support may be partial - for instance with a QCOW2 backend, a write that is not cluster aligned would likely only partially satisfy the command by deallocating bytes. There is no current flag for 'supports fast write zeroes' and (given the foregoing) it isn't evident to me exactly what it would mean.

I suggest to add this flag - which is a negotiation-phase flag, exposing 
support of the whole feature (separate command or flag for clearing the 
whole disk). Fast here means that we can do this in one request. 
write_zeroes(of any size, up to the whole disk) is fast if it will not 
take more time than usual write (restricted to 2G).

>
> It seems however you could support your use case by simply iterating through the backup disk, using NBD_CMD_WRITE for the areas that are allocated and non-zero, and using NBD_CMD_WRITE_ZEROES for the areas that are not allocated or zeroed. This technique would not require a protocol change (beyond the existing NBD_CMD_WRITE_ZEROES extension), works irrespective of whether the target supports write zeroes or not, works irrespective of difference in cluster allocation size between source and target, is far simpler, and has the added advantage of making the existing zeroes-but-not-holes area into holes (that is optional if you can tell the difference between zeroes and holes on the source media). It also works on a single pass. Yes, you need to split requests up, but you need to split requests up ANYWAY to cope with NBD_CMD_WRITE's 2^32-1 length limit (I strongly advise you not to use more than 2^31). And in any case, you probably want to parallelise reads and writes and have more than one write in flight in any case, all of which suggests you are going to be breaking up requests anyway.
>
This is slow, see my first letter. Iterative zeroing of qcow2 is slow.

Why separate command/flag for clearing the whole disk is better for me 
than block-based solution with splitting requests? I want to clear the 
whole disk and I don't want to introduce new functionality, which I 
don't need for now. I need to clearing the whole disk, but with 
block-based solution I have a lot of code, which solves another task. 
And it only indirectly solves my task. I.e. instead of 
simple_realisation+simple_usage+nice_solution_for_my_task I have 
harder_realisation+harder_usage+ugly_solution_for_my_task.

I understand, that we must take into account that such functionality 
(large requests) will likely be needed in future, so more generic 
solution is better for a protocol. And I suggest a compromise:

negotiation-phase flag NBD_FLAG_SEND_BIG_REQUEST : command flag 
NBD_CMD_FLAG_BIG_REQUEST is supported for WRITE_ZEROES and TRIM
negotiation-phase flag NBD_FLAG_SEND_BIG_REQUEST_REGION : non-zero 
length is supported for big request

flag NBD_CMD_FLAG_BIG_REQUEST is set and length = 0    -> request on the 
whole disk, offset must be 0
flag NBD_CMD_FLAG_BIG_REQUEST is set and length > 0    -> request on 
(offset*block_size, length*block_size), length*block_size must be <= 
disk_size (only if NBD_FLAG_SEND_BIG_REQUEST_REGION is negotiated)
flag NBD_CMD_FLAG_BIG_REQUEST is unset     ->    usual request on 
(offset, length)

....

or a separate command/flag for clearing the whole disk, and separate 
block-based solution in future if needed.

....

or new request type with 64bit length


-- 
Best regards,
Vladimir

  reply	other threads:[~2016-09-24 20:20 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-23 18:32 [Qemu-devel] write_zeroes/trim on the whole disk Vladimir Sementsov-Ogievskiy
2016-09-23 19:00 ` Eric Blake
2016-09-23 21:21   ` Wouter Verhelst
2016-09-24  7:54     ` Denis V. Lunev
2016-09-24 10:31     ` [Qemu-devel] [Nbd] " Alex Bligh
2016-09-24 22:07       ` Wouter Verhelst
2016-09-24 12:06     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2016-09-24 12:27       ` Vladimir Sementsov-Ogievskiy
2016-09-26  8:47         ` Kevin Wolf
2016-09-26 12:49         ` Paolo Bonzini
2016-09-24 13:42       ` Vladimir Sementsov-Ogievskiy
2016-09-24 16:20         ` Vladimir Sementsov-Ogievskiy
2016-09-24 16:35           ` Alex Bligh
2016-09-24 16:44             ` Vladimir Sementsov-Ogievskiy
2016-09-24 16:48               ` Vladimir Sementsov-Ogievskiy
2016-09-24 16:52                 ` Alex Bligh
2016-09-24 17:01                   ` Alex Bligh
2016-09-24 16:31       ` Alex Bligh
2016-09-24 16:42         ` Vladimir Sementsov-Ogievskiy
2016-09-24 16:49           ` Alex Bligh
2016-09-24 17:13             ` Vladimir Sementsov-Ogievskiy
2016-09-24 17:32               ` Alex Bligh
2016-09-24 17:47                 ` Vladimir Sementsov-Ogievskiy
2016-09-24 18:24                   ` Alex Bligh
2016-09-24 20:19                     ` Vladimir Sementsov-Ogievskiy [this message]
2016-09-24 22:30                       ` Wouter Verhelst
2016-09-24 17:33               ` Vladimir Sementsov-Ogievskiy
2016-09-24 20:14                 ` [Qemu-devel] [Nbd] " Carl-Daniel Hailfinger
2016-09-24 20:32                   ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E6DFE9.1070300@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=alex@alex.org.uk \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.