qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Alex Bligh <alex@alex.org.uk>
Cc: Wouter Verhelst <w@uter.be>, Eric Blake <eblake@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	qemu block <qemu-block@nongnu.org>,
	"nbd-general@lists.sourceforge.net"
	<nbd-general@lists.sourceforge.net>,
	Kevin Wolf <kwolf@redhat.com>, "Denis V. Lunev" <den@openvz.org>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"Stefan stefanha@redhat. com" <stefanha@redhat.com>
Subject: Re: [Qemu-devel] write_zeroes/trim on the whole disk
Date: Sat, 24 Sep 2016 20:47:18 +0300	[thread overview]
Message-ID: <57E6BC26.70205@virtuozzo.com> (raw)
In-Reply-To: <E0ACEB4D-D4AF-4CEA-BE59-4FB799C91BAA@alex.org.uk>

On 24.09.2016 20:32, Alex Bligh wrote:
>> On 24 Sep 2016, at 18:13, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>>
>> On 24.09.2016 19:49, Alex Bligh wrote:
>>>> On 24 Sep 2016, at 17:42, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>>>>
>>>> On 24.09.2016 19:31, Alex Bligh wrote:
>>>>>> On 24 Sep 2016, at 13:06, Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> wrote:
>>>>>>
>>>>>> Note: if disk size is not aligned to X we will have to send request larger than the disk size to clear the whole disk.
>>>>> If you look at the block size extension, the size of the disk must be an exact multiple of the minimum block size. So that would work.
>> This means that this extension could not be used with any qcow2 disk, as qcow2 may have size not aligned to its cluster size.
>>
>> # qemu-img create -f qcow2 mega 1K
>> Formatting 'mega', fmt=qcow2 size=1024 encryption=off cluster_size=65536 lazy_refcounts=off refcount_bits=16
>> # qemu-img info mega
>> image: mega
>> file format: qcow2
>> virtual size: 1.0K (1024 bytes)
>> disk size: 196K
>> cluster_size: 65536
>> Format specific information:
>>     compat: 1.1
>>     lazy refcounts: false
>>     refcount bits: 16
>>     corrupt: false
>>
>> And there is no such restriction in documentation. Or we have to consider sector-size (512b) as block size for qcow2, which is too small for our needs.
> If by "this extension" you mean the INFO extension (which reports block sizes) that's incorrect.
>
> An nbd server using a QCOW2 file as the backend would report the sector size as the minimum block size. It might report the cluster size or the sector size as the preferred block size, or anything in between.
>
> QCOW2 cluster size essentially determines the allocation unit. NBD is not bothered as to the underlying allocation unit. It does not (currently) support the concept of making holes visible to the client. If you use NBD_CMD_WRITE_ZEREOS you get zeroes, which might or might not be implemented as one or more holes or 'real' zeroes (save if you specify NBD_CMD_FLAG_NO_HOLE in which case you are guaranteed to get 'real' zeroes'). If you use NBD_CMD_TRIM then the area trimmed might nor might not be written with one or more whole. There is (currently) no way to detect the presence of holes separately from zeroes (though a bitmap extension was discussed).

I just wanted to say, that if we want a possibility of clearing the 
whole disk in one request for qcow2 we have to take 512 as granularity 
for such requests (with X = 9). An this is too small. 1tb will be the 
upper bound for the request.

>
>>>> But there is no guarantee that disk_size/block_size < INT_MAX..
>>> I think you mean 2^32-1, but yes there is no guarantee of that. In that case you would need to break the call up into multiple calls.
>>>
>>> However, being able to break the call up into multiple calls seems pretty sensible given that NBD_CMD_WRITE_ZEROES may take a large amount of
>>> time, and a REALLY long time if the server doesn't support trim.
>>>
>>>> May be, additional option, specifying the shift would be better. With convention that if offset+length exceeds disk size, length should be recalculated as disk_size-offset.
>>> I don't think we should do that. We already have clear semantics that prevent operations beyond the end of the disk. Again, just break the command up into multipl commands. No great hardship.
>>>
>> I agree that requests larger than disk size are ugly.. But splitting request brings me again to idea of having separate command or flag for clearing the whole disk without that dance. Server may report availability of this/flag command only if target driver supports fast write_zeroes (qcow2 in our case).
> Why? In the general case you need to break up requests anyway (particularly with the INFO extension where there is a maximum command size), and issuing a command over a TCP connection that might take hours or days to complete with no hint of progress, and no TCP traffic to keep NAT etc. alive, sounds like bad practice. The overhead is tiny.
>
> I would be against this change.
>

Full backup, for example:

1. target can do fast write_zeroes: clear the whole disk (great if we 
can do it in one request, without splitting, etc), then backup all data 
except zero or unallocated (save a lot of time on this skipping).
2. target can not do fast write_zeroes: just backup all data. We need 
not clear the disk, as we will not save time by this.

So here, we need not splitting as a general. Just clear all or not 
clearing at all.

-- 
Best regards,
Vladimir

  reply	other threads:[~2016-09-24 17:47 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-23 18:32 [Qemu-devel] write_zeroes/trim on the whole disk Vladimir Sementsov-Ogievskiy
2016-09-23 19:00 ` Eric Blake
2016-09-23 21:21   ` Wouter Verhelst
2016-09-24  7:54     ` Denis V. Lunev
2016-09-24 10:31     ` [Qemu-devel] [Nbd] " Alex Bligh
2016-09-24 22:07       ` Wouter Verhelst
2016-09-24 12:06     ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2016-09-24 12:27       ` Vladimir Sementsov-Ogievskiy
2016-09-26  8:47         ` Kevin Wolf
2016-09-26 12:49         ` Paolo Bonzini
2016-09-24 13:42       ` Vladimir Sementsov-Ogievskiy
2016-09-24 16:20         ` Vladimir Sementsov-Ogievskiy
2016-09-24 16:35           ` Alex Bligh
2016-09-24 16:44             ` Vladimir Sementsov-Ogievskiy
2016-09-24 16:48               ` Vladimir Sementsov-Ogievskiy
2016-09-24 16:52                 ` Alex Bligh
2016-09-24 17:01                   ` Alex Bligh
2016-09-24 16:31       ` Alex Bligh
2016-09-24 16:42         ` Vladimir Sementsov-Ogievskiy
2016-09-24 16:49           ` Alex Bligh
2016-09-24 17:13             ` Vladimir Sementsov-Ogievskiy
2016-09-24 17:32               ` Alex Bligh
2016-09-24 17:47                 ` Vladimir Sementsov-Ogievskiy [this message]
2016-09-24 18:24                   ` Alex Bligh
2016-09-24 20:19                     ` Vladimir Sementsov-Ogievskiy
2016-09-24 22:30                       ` Wouter Verhelst
2016-09-24 17:33               ` Vladimir Sementsov-Ogievskiy
2016-09-24 20:14                 ` [Qemu-devel] [Nbd] " Carl-Daniel Hailfinger
2016-09-24 20:32                   ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57E6BC26.70205@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=alex@alex.org.uk \
    --cc=den@openvz.org \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=nbd-general@lists.sourceforge.net \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=w@uter.be \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).