From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=57298 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P5KbR-0006gR-5c for qemu-devel@nongnu.org; Mon, 11 Oct 2010 11:47:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P5KbP-0007XI-Nb for qemu-devel@nongnu.org; Mon, 11 Oct 2010 11:47:52 -0400 Received: from mx1.redhat.com ([209.132.183.28]:4084) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P5KbP-0007XC-HI for qemu-devel@nongnu.org; Mon, 11 Oct 2010 11:47:51 -0400 Message-ID: <4CB33199.407@redhat.com> Date: Mon, 11 Oct 2010 17:47:37 +0200 From: Avi Kivity MIME-Version: 1.0 References: <1286552914-27014-1-git-send-email-stefanha@linux.vnet.ibm.com> <1286552914-27014-4-git-send-email-stefanha@linux.vnet.ibm.com> <4CB18549.3020206@redhat.com> <20101011100954.GA4078@stefan-thinkpad.transitives.com> <4CB30B43.2040706@redhat.com> <20101011134241.GA5439@stefan-thinkpad.transitives.com> <4CB314C6.4040001@redhat.com> <4CB326EC.6060304@linux.vnet.ibm.com> <4CB32C33.4090208@redhat.com> <4CB33030.3030909@linux.vnet.ibm.com> In-Reply-To: <4CB33030.3030909@linux.vnet.ibm.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Kevin Wolf , Anthony Liguori , Christoph Hellwig , Stefan Hajnoczi , qemu-devel@nongnu.org On 10/11/2010 05:41 PM, Anthony Liguori wrote: > On 10/11/2010 10:24 AM, Avi Kivity wrote: >> On 10/11/2010 05:02 PM, Anthony Liguori wrote: >>> On 10/11/2010 08:44 AM, Avi Kivity wrote: >>>> On 10/11/2010 03:42 PM, Stefan Hajnoczi wrote: >>>>> > >>>>> > A leak is acceptable (it won't grow; it's just an unused, >>>>> incorrect >>>>> > freelist), but data corruption is not. >>>>> >>>>> The alternative is for the freelist to be a non-compat feature bit. >>>>> That means older QEMU binaries cannot use a QED image that has >>>>> enabled >>>>> the freelist. >>>> >>>> For this one feature. What about others? >>> >>> A compat feature is one where the feature can be completely ignored >>> (meaning that the QEMU does not have to understand the data format). >>> >>> An example of a compat feature is copy-on-read. It's merely a >>> suggestion and there is no additional metadata. If a QEMU doesn't >>> understand it, it doesn't affect it's ability to read the image. >>> >>> An example of a non-compat feature would be zero cluster entries. >>> Zero cluster entries are a special L2 table entry that indicates >>> that a cluster's on-disk data is all zeros. As long as there is at >>> least 1 ZCE in the L2 tables, this feature bit must be set. As soon >>> as all of the ZCE bits are cleared, the feature bit can be unset. >>> >>> An older QEMU will gracefully fail when presented with an image >>> using ZCE bits. An image with no ZCEs will work on older QEMUs. >>> >> >> What's the motivation behind ZCE? > > It's very useful for Copy-on-Read. If the cluster in the backing file > is unallocated, then when you do a copy-on-read, you don't want to > write out a zero cluster since you'd expand the image to it's maximum > size. > > It's also useful for operations like compaction in the absence of > TRIM. The common implementation on platforms like VMware is to open a > file and write zeros to it until it fills up the filesystem. You then > delete the file. The result is that any unallocated data on the disk > is written as zero and combined with zero-detection in the image > format, you can compact the image size by marking unallocated blocks > as ZCE. Both make sense. The latter is also useful with TRIM: if you have a backing image it's better to implement TRIM with ZCE rather than exposing the cluster from the backing file; it saves you a COW when you later reallocate the cluster. > >> There is yet a third type of feature, one which is not strictly >> needed in order to use the image, but if used, must be kept >> synchronized. An example is the freelist. Another example is a >> directory index for a filesystem. I can't think of another example >> which would be relevant to QED -- metadata checksums perhaps? -- we >> can always declare it a non-compatible feature, but of course, it >> reduces compatibility. > > You're suggesting a feature that is not strictly needed, but that > needs to be kept up to date. If it can't be kept up to date, > something needs to happen to remove it. Let's call this a transient > feature. > > Most of the transient features can be removed given some bit of code. > For instance, ZCE can be removed by writing out zero clusters or > writing an unallocated L2 entry if there is no backing file. > > I think we could add a qemu-img demote command or something like that > that attempted to remove features when possible. That doesn't give > you instant compatibility but I'm doubtful that you can come up with a > generic way to remove a feature from an image without knowing anything > about the image. > That should work, and in the worst case there is qemu-img convert (which should be taught about format options). -- error compiling committee.c: too many arguments to function