From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:44347)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <vsementsov@virtuozzo.com>) id 1bntQj-0000na-93
	for qemu-devel@nongnu.org; Sat, 24 Sep 2016 16:20:14 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <vsementsov@virtuozzo.com>) id 1bntQd-0004Iz-Cx
	for qemu-devel@nongnu.org; Sat, 24 Sep 2016 16:20:12 -0400
References: <57E5752C.3080407@virtuozzo.com>
	<a3d525e9-a66e-d086-55a4-5def3824964d@redhat.com>
	<20160923212126.vo3hvb4hxojjh7s4@grep.be>
	<57E66C60.8040102@virtuozzo.com>
	<FD5348CE-88CA-4833-AB84-90D04058A9AF@alex.org.uk>
	<57E6ACDD.7080205@virtuozzo.com>
	<D47EBA1A-3256-40C2-A394-6A849F1B4B1D@alex.org.uk>
	<57E6B423.6010007@virtuozzo.com>
	<E0ACEB4D-D4AF-4CEA-BE59-4FB799C91BAA@alex.org.uk>
	<57E6BC26.70205@virtuozzo.com>
	<6F90A726-42D5-4B71-ADA9-63740B5048AE@alex.org.uk>
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-ID: <57E6DFE9.1070300@virtuozzo.com>
Date: Sat, 24 Sep 2016 23:19:53 +0300
MIME-Version: 1.0
In-Reply-To: <6F90A726-42D5-4B71-ADA9-63740B5048AE@alex.org.uk>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] write_zeroes/trim on the whole disk
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Bligh <alex@alex.org.uk>
Cc: Wouter Verhelst <w@uter.be>, Eric Blake <eblake@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, qemu block <qemu-block@nongnu.org>, "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>, Kevin Wolf <kwolf@redhat.com>, "Denis V. Lunev" <den@openvz.org>, Paolo Bonzini <pbonzini@redhat.com>, "Stefan stefanha@redhat. com" <stefanha@redhat.com>

On 24.09.2016 21:24, Alex Bligh wrote:
>> On 24 Sep 2016, at 18:47, Vladimir Sementsov-Ogievskiy <vsementsov@vir=
tuozzo.com> wrote:
>>
>> I just wanted to say, that if we want a possibility of clearing the wh=
ole disk in one request for qcow2 we have to take 512 as granularity for =
such requests (with X =3D 9). An this is too small. 1tb will be the upper=
 bound for the request.
> Sure. But I do not see the value in optimising these huge commands to r=
un as single requests. If you want to do that, do it properly and have a =
negotiation-phase flag that supports 64 bit request lengths.

And add additional request type with another magic in first field and=20
64bit length field? If such solution is appropriate for nbd it is ok for =

me of course. I've proposed something like this in first letter -=20
"Increase length field of the request to 64bit". Changing existing=20
request message type is wrong of course, but creating an additional one=20
should be ok.

>
>> Full backup, for example:
>>
>> 1. target can do fast write_zeroes: clear the whole disk (great if we =
can do it in one request, without splitting, etc), then backup all data e=
xcept zero or unallocated (save a lot of time on this skipping).
>> 2. target can not do fast write_zeroes: just backup all data. We need =
not clear the disk, as we will not save time by this.
>>
>> So here, we need not splitting as a general. Just clear all or not cle=
aring at all.
> As I said, within the current protocol you cannot tell whether a target=
 supports 'fast write zeroes', and indeed the support may be partial - fo=
r instance with a QCOW2 backend, a write that is not cluster aligned woul=
d likely only partially satisfy the command by deallocating bytes. There =
is no current flag for 'supports fast write zeroes' and (given the forego=
ing) it isn't evident to me exactly what it would mean.

I suggest to add this flag - which is a negotiation-phase flag, exposing =

support of the whole feature (separate command or flag for clearing the=20
whole disk). Fast here means that we can do this in one request.=20
write_zeroes(of any size, up to the whole disk) is fast if it will not=20
take more time than usual write (restricted to 2G).

>
> It seems however you could support your use case by simply iterating th=
rough the backup disk, using NBD_CMD_WRITE for the areas that are allocat=
ed and non-zero, and using NBD_CMD_WRITE_ZEROES for the areas that are no=
t allocated or zeroed. This technique would not require a protocol change=
 (beyond the existing NBD_CMD_WRITE_ZEROES extension), works irrespective=
 of whether the target supports write zeroes or not, works irrespective o=
f difference in cluster allocation size between source and target, is far=
 simpler, and has the added advantage of making the existing zeroes-but-n=
ot-holes area into holes (that is optional if you can tell the difference=
 between zeroes and holes on the source media). It also works on a single=
 pass. Yes, you need to split requests up, but you need to split requests=
 up ANYWAY to cope with NBD_CMD_WRITE's 2^32-1 length limit (I strongly a=
dvise you not to use more than 2^31). And in any case, you probably want =
to parallelise reads and writes and have more than one write in flight in=
 any case, all of which suggests you are going to be breaking up requests=
 anyway.
>
This is slow, see my first letter. Iterative zeroing of qcow2 is slow.

Why separate command/flag for clearing the whole disk is better for me=20
than block-based solution with splitting requests? I want to clear the=20
whole disk and I don't want to introduce new functionality, which I=20
don't need for now. I need to clearing the whole disk, but with=20
block-based solution I have a lot of code, which solves another task.=20
And it only indirectly solves my task. I.e. instead of=20
simple_realisation+simple_usage+nice_solution_for_my_task I have=20
harder_realisation+harder_usage+ugly_solution_for_my_task.

I understand, that we must take into account that such functionality=20
(large requests) will likely be needed in future, so more generic=20
solution is better for a protocol. And I suggest a compromise:

negotiation-phase flag NBD_FLAG_SEND_BIG_REQUEST : command flag=20
NBD_CMD_FLAG_BIG_REQUEST is supported for WRITE_ZEROES and TRIM
negotiation-phase flag NBD_FLAG_SEND_BIG_REQUEST_REGION : non-zero=20
length is supported for big request

flag NBD_CMD_FLAG_BIG_REQUEST is set and length =3D 0    -> request on th=
e=20
whole disk, offset must be 0
flag NBD_CMD_FLAG_BIG_REQUEST is set and length > 0    -> request on=20
(offset*block_size, length*block_size), length*block_size must be <=3D=20
disk_size (only if NBD_FLAG_SEND_BIG_REQUEST_REGION is negotiated)
flag NBD_CMD_FLAG_BIG_REQUEST is unset     ->    usual request on=20
(offset, length)

=2E...

or a separate command/flag for clearing the whole disk, and separate=20
block-based solution in future if needed.

=2E...

or new request type with 64bit length


--=20
Best regards,
Vladimir