From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:34988)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <vsementsov@virtuozzo.com>) id 1bnqpF-0002bC-9n
	for qemu-devel@nongnu.org; Sat, 24 Sep 2016 13:33:22 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <vsementsov@virtuozzo.com>) id 1bnqpA-0006CW-8j
	for qemu-devel@nongnu.org; Sat, 24 Sep 2016 13:33:21 -0400
References: <57E5752C.3080407@virtuozzo.com>
	<a3d525e9-a66e-d086-55a4-5def3824964d@redhat.com>
	<20160923212126.vo3hvb4hxojjh7s4@grep.be>
	<57E66C60.8040102@virtuozzo.com>
	<FD5348CE-88CA-4833-AB84-90D04058A9AF@alex.org.uk>
	<57E6ACDD.7080205@virtuozzo.com>
	<D47EBA1A-3256-40C2-A394-6A849F1B4B1D@alex.org.uk>
	<57E6B423.6010007@virtuozzo.com>
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Message-ID: <57E6B8CF.6040406@virtuozzo.com>
Date: Sat, 24 Sep 2016 20:33:03 +0300
MIME-Version: 1.0
In-Reply-To: <57E6B423.6010007@virtuozzo.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] write_zeroes/trim on the whole disk
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Alex Bligh <alex@alex.org.uk>
Cc: Wouter Verhelst <w@uter.be>, Eric Blake <eblake@redhat.com>, "qemu-devel@nongnu.org" <qemu-devel@nongnu.org>, qemu block <qemu-block@nongnu.org>, "nbd-general@lists.sourceforge.net" <nbd-general@lists.sourceforge.net>, Kevin Wolf <kwolf@redhat.com>, "Denis V. Lunev" <den@openvz.org>, Paolo Bonzini <pbonzini@redhat.com>, "Stefan stefanha@redhat. com" <stefanha@redhat.com>

On 24.09.2016 20:13, Vladimir Sementsov-Ogievskiy wrote:
> On 24.09.2016 19:49, Alex Bligh wrote:
>>> On 24 Sep 2016, at 17:42, Vladimir Sementsov-Ogievskiy 
>>> <vsementsov@virtuozzo.com> wrote:
>>>
>>> On 24.09.2016 19:31, Alex Bligh wrote:
>>>>> On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy 
>>>>> <vsementsov@virtuozzo.com> wrote:
>>>>>
>>>>> Note: if disk size is not aligned to X we will have to send 
>>>>> request larger than the disk size to clear the whole disk.
>>>> If you look at the block size extension, the size of the disk must 
>>>> be an exact multiple of the minimum block size. So that would work.
>
> This means that this extension could not be used with any qcow2 disk, 
> as qcow2 may have size not aligned to its cluster size.
>
> # qemu-img create -f qcow2 mega 1K
> Formatting 'mega', fmt=qcow2 size=1024 encryption=off 
> cluster_size=65536 lazy_refcounts=off refcount_bits=16
> # qemu-img info mega
> image: mega
> file format: qcow2
> virtual size: 1.0K (1024 bytes)
> disk size: 196K
> cluster_size: 65536
> Format specific information:
>     compat: 1.1
>     lazy refcounts: false
>     refcount bits: 16
>     corrupt: false
>
> And there is no such restriction in documentation. Or we have to 
> consider sector-size (512b) as block size for qcow2, which is too 
> small for our needs.
>
>>>>
>>> But there is no guarantee that disk_size/block_size < INT_MAX..
>> I think you mean 2^32-1, but yes there is no guarantee of that. In 
>> that case you would need to break the call up into multiple calls.
>>
>> However, being able to break the call up into multiple calls seems 
>> pretty sensible given that NBD_CMD_WRITE_ZEROES may take a large 
>> amount of
>> time, and a REALLY long time if the server doesn't support trim.
>>
>>> May be, additional option, specifying the shift would be better. 
>>> With convention that if offset+length exceeds disk size, length 
>>> should be recalculated as disk_size-offset.
>> I don't think we should do that. We already have clear semantics that 
>> prevent operations beyond the end of the disk. Again, just break the 
>> command up into multipl commands. No great hardship.
>>
>
> I agree that requests larger than disk size are ugly.. But splitting 
> request brings me again to idea of having separate command or flag for 
> clearing the whole disk without that dance. Server may report 
> availability of this/flag command only if target driver supports fast 
> write_zeroes (qcow2 in our case).
>

Also, such flag may be used to satisfy all needs:

flag BIG_REQUEST is set and length = 0    ->    request on the whole 
disk, offset must be 0
flag BIG_REQUEST is set and length > 0    ->    request on 
(offset*block_size, length*block_size), length*block_size must be <= 
disk_size
flag BIG_REQUEST is unset     ->    usual request on (offset, length)

-- 
Best regards,
Vladimir