From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44347) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bntQj-0000na-93 for qemu-devel@nongnu.org; Sat, 24 Sep 2016 16:20:14 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bntQd-0004Iz-Cx for qemu-devel@nongnu.org; Sat, 24 Sep 2016 16:20:12 -0400 References: <57E5752C.3080407@virtuozzo.com> <20160923212126.vo3hvb4hxojjh7s4@grep.be> <57E66C60.8040102@virtuozzo.com> <57E6ACDD.7080205@virtuozzo.com> <57E6B423.6010007@virtuozzo.com> <57E6BC26.70205@virtuozzo.com> <6F90A726-42D5-4B71-ADA9-63740B5048AE@alex.org.uk> From: Vladimir Sementsov-Ogievskiy Message-ID: <57E6DFE9.1070300@virtuozzo.com> Date: Sat, 24 Sep 2016 23:19:53 +0300 MIME-Version: 1.0 In-Reply-To: <6F90A726-42D5-4B71-ADA9-63740B5048AE@alex.org.uk> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] write_zeroes/trim on the whole disk List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Bligh Cc: Wouter Verhelst , Eric Blake , "qemu-devel@nongnu.org" , qemu block , "nbd-general@lists.sourceforge.net" , Kevin Wolf , "Denis V. Lunev" , Paolo Bonzini , "Stefan stefanha@redhat. com" On 24.09.2016 21:24, Alex Bligh wrote: >> On 24 Sep 2016, at 18:47, Vladimir Sementsov-Ogievskiy wrote: >> >> I just wanted to say, that if we want a possibility of clearing the wh= ole disk in one request for qcow2 we have to take 512 as granularity for = such requests (with X =3D 9). An this is too small. 1tb will be the upper= bound for the request. > Sure. But I do not see the value in optimising these huge commands to r= un as single requests. If you want to do that, do it properly and have a = negotiation-phase flag that supports 64 bit request lengths. And add additional request type with another magic in first field and=20 64bit length field? If such solution is appropriate for nbd it is ok for = me of course. I've proposed something like this in first letter -=20 "Increase length field of the request to 64bit". Changing existing=20 request message type is wrong of course, but creating an additional one=20 should be ok. > >> Full backup, for example: >> >> 1. target can do fast write_zeroes: clear the whole disk (great if we = can do it in one request, without splitting, etc), then backup all data e= xcept zero or unallocated (save a lot of time on this skipping). >> 2. target can not do fast write_zeroes: just backup all data. We need = not clear the disk, as we will not save time by this. >> >> So here, we need not splitting as a general. Just clear all or not cle= aring at all. > As I said, within the current protocol you cannot tell whether a target= supports 'fast write zeroes', and indeed the support may be partial - fo= r instance with a QCOW2 backend, a write that is not cluster aligned woul= d likely only partially satisfy the command by deallocating bytes. There = is no current flag for 'supports fast write zeroes' and (given the forego= ing) it isn't evident to me exactly what it would mean. I suggest to add this flag - which is a negotiation-phase flag, exposing = support of the whole feature (separate command or flag for clearing the=20 whole disk). Fast here means that we can do this in one request.=20 write_zeroes(of any size, up to the whole disk) is fast if it will not=20 take more time than usual write (restricted to 2G). > > It seems however you could support your use case by simply iterating th= rough the backup disk, using NBD_CMD_WRITE for the areas that are allocat= ed and non-zero, and using NBD_CMD_WRITE_ZEROES for the areas that are no= t allocated or zeroed. This technique would not require a protocol change= (beyond the existing NBD_CMD_WRITE_ZEROES extension), works irrespective= of whether the target supports write zeroes or not, works irrespective o= f difference in cluster allocation size between source and target, is far= simpler, and has the added advantage of making the existing zeroes-but-n= ot-holes area into holes (that is optional if you can tell the difference= between zeroes and holes on the source media). It also works on a single= pass. Yes, you need to split requests up, but you need to split requests= up ANYWAY to cope with NBD_CMD_WRITE's 2^32-1 length limit (I strongly a= dvise you not to use more than 2^31). And in any case, you probably want = to parallelise reads and writes and have more than one write in flight in= any case, all of which suggests you are going to be breaking up requests= anyway. > This is slow, see my first letter. Iterative zeroing of qcow2 is slow. Why separate command/flag for clearing the whole disk is better for me=20 than block-based solution with splitting requests? I want to clear the=20 whole disk and I don't want to introduce new functionality, which I=20 don't need for now. I need to clearing the whole disk, but with=20 block-based solution I have a lot of code, which solves another task.=20 And it only indirectly solves my task. I.e. instead of=20 simple_realisation+simple_usage+nice_solution_for_my_task I have=20 harder_realisation+harder_usage+ugly_solution_for_my_task. I understand, that we must take into account that such functionality=20 (large requests) will likely be needed in future, so more generic=20 solution is better for a protocol. And I suggest a compromise: negotiation-phase flag NBD_FLAG_SEND_BIG_REQUEST : command flag=20 NBD_CMD_FLAG_BIG_REQUEST is supported for WRITE_ZEROES and TRIM negotiation-phase flag NBD_FLAG_SEND_BIG_REQUEST_REGION : non-zero=20 length is supported for big request flag NBD_CMD_FLAG_BIG_REQUEST is set and length =3D 0 -> request on th= e=20 whole disk, offset must be 0 flag NBD_CMD_FLAG_BIG_REQUEST is set and length > 0 -> request on=20 (offset*block_size, length*block_size), length*block_size must be <=3D=20 disk_size (only if NBD_FLAG_SEND_BIG_REQUEST_REGION is negotiated) flag NBD_CMD_FLAG_BIG_REQUEST is unset -> usual request on=20 (offset, length) =2E... or a separate command/flag for clearing the whole disk, and separate=20 block-based solution in future if needed. =2E... or new request type with 64bit length --=20 Best regards, Vladimir