From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:56539) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aldmA-0002UO-T6 for qemu-devel@nongnu.org; Thu, 31 Mar 2016 10:40:52 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aldm5-0002GX-Sm for qemu-devel@nongnu.org; Thu, 31 Mar 2016 10:40:46 -0400 Received: from mx1.redhat.com ([209.132.183.28]:32867) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aldm5-0002GT-HT for qemu-devel@nongnu.org; Thu, 31 Mar 2016 10:40:41 -0400 References: <1459429325-16350-1-git-send-email-den@openvz.org> <24E4A85C-254F-4324-A2F4-9DACA6037381@alex.org.uk> <56FD2C6A.9030208@redhat.com> <357ECCE6-4A6F-430A-9C2C-214D775CFBFE@alex.org.uk> From: Paolo Bonzini Message-ID: <56FD36E3.1010402@redhat.com> Date: Thu, 31 Mar 2016 16:40:35 +0200 MIME-Version: 1.0 In-Reply-To: <357ECCE6-4A6F-430A-9C2C-214D775CFBFE@alex.org.uk> Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v2 1/1] NBD proto: add WRITE_ZEROES extension List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Bligh Cc: "nbd-general@lists.sourceforge.net" , Kevin Wolf , "qemu-devel@nongnu.org" , Pavel Borzenkov , "Stefan stefanha@redhat. com" , Wouter Verhelst , "Denis V. Lunev" On 31/03/2016 16:27, Alex Bligh wrote: > > > IE why not always permit trimming PROVIDED the data always reads ba= ck > > > as zero? This would be far simpler. > >=20 > > Because trimming can make future operations more expensive and cause > > fragmentation (which may not be as bad as it used to be at the media > > level, but it is still somewhat bad at the filesystem level). > >=20 > > So if you want a fully-provisioned file, the simplest way to do so is= to > > write zeroes to it, and trimming is undesirable. > But isn't the server in a better position to know this than the > client? There are at least three possible states for a sector: - hole (thin-provisioned) - allocated as data (disk contains actual zeroes) - allocated as unwritten (blocks reserved on backing storage, reads as zeroes but the disk may not contain actual zeroes) It's always okay for the backend to convert a zero block to an unwritten extent; it's generally not okay for a backend to take a request to create an unwritten extent and instead create a hole. It's all an "as if" situation. The server must provide the semantics requested by the client. For example, writing to a hole could cause ENOSPC, writing to an unwritten extend could not. The server might know better, because it certainly is in a better position to know how to fulfill the client's request. But even if it's just a hint, it makes sense for NBD to provide it. It's not a coincidence that this hint exists at all levels: SCSI has an UNMAP bit that can be set in the WRITE SAME command (and it has UNMAP which matches NBD's TRIM); the fallocate system call has FALLOC_FL_ZERO_RANGE and FALLOC_FL_PUNCH_HOLE (plus Linux has the BLKDISCARD ioctl which again matches NBD's TRIM for block devices). > EG if the server has a back end implementation (as I suspect > Ceph on qemu-nbd does) Ceph doesn't, but gluster does. > which never actually stores all zero blocks, > it won't make a difference, and conceivably you're generating a whole > pile of I/O to avoid sparseness when sparseness might be faster. Take > for example a persistent memory interface, where fragmentation is > irrelevant, and writing piles of zeroes to memory is a waste of time. It certainly isn't a waste of time if your intention is to scrub data belonging to a previous tenant, before giving access to someone else! If you have a metadata layer above then you can handle the command there (that's why we're adding it); if you haven't you do have to write the zeroes. Paolo