From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=53072 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P5IfC-0001wj-Fc for qemu-devel@nongnu.org; Mon, 11 Oct 2010 09:43:42 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P5IeP-0000D6-SI for qemu-devel@nongnu.org; Mon, 11 Oct 2010 09:42:50 -0400 Received: from mtagate3.uk.ibm.com ([194.196.100.163]:52876) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P5IeP-0000CY-KR for qemu-devel@nongnu.org; Mon, 11 Oct 2010 09:42:49 -0400 Received: from d06nrmr1707.portsmouth.uk.ibm.com (d06nrmr1707.portsmouth.uk.ibm.com [9.149.39.225]) by mtagate3.uk.ibm.com (8.13.1/8.13.1) with ESMTP id o9BDgiHt017979 for ; Mon, 11 Oct 2010 13:42:44 GMT Received: from d06av03.portsmouth.uk.ibm.com (d06av03.portsmouth.uk.ibm.com [9.149.37.213]) by d06nrmr1707.portsmouth.uk.ibm.com (8.13.8/8.13.8/NCO v10.0) with ESMTP id o9BDggbR1425442 for ; Mon, 11 Oct 2010 14:42:44 +0100 Received: from d06av03.portsmouth.uk.ibm.com (loopback [127.0.0.1]) by d06av03.portsmouth.uk.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id o9BDggbJ019627 for ; Mon, 11 Oct 2010 14:42:42 +0100 Date: Mon, 11 Oct 2010 14:42:41 +0100 From: Stefan Hajnoczi Message-ID: <20101011134241.GA5439@stefan-thinkpad.transitives.com> References: <1286552914-27014-1-git-send-email-stefanha@linux.vnet.ibm.com> <1286552914-27014-4-git-send-email-stefanha@linux.vnet.ibm.com> <4CB18549.3020206@redhat.com> <20101011100954.GA4078@stefan-thinkpad.transitives.com> <4CB30B43.2040706@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4CB30B43.2040706@redhat.com> Subject: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Kevin Wolf , Anthony Liguori , qemu-devel@nongnu.org, Christoph Hellwig On Mon, Oct 11, 2010 at 03:04:03PM +0200, Avi Kivity wrote: > On 10/11/2010 12:09 PM, Stefan Hajnoczi wrote: > >On Sun, Oct 10, 2010 at 11:20:09AM +0200, Avi Kivity wrote: > >> On 10/08/2010 05:48 PM, Stefan Hajnoczi wrote: > >> >Signed-off-by: Stefan Hajnoczi > >> >--- > >> > docs/specs/qed_spec.txt | 94 +++++++++++++++++++++++++++++++++++++++++++++++ > >> > 1 files changed, 94 insertions(+), 0 deletions(-) > >> > > >> >+Feature bits: > >> >+* QED_F_BACKING_FILE = 0x01. The image uses a backing file. > >> >+* QED_F_NEED_CHECK = 0x02. The image needs a consistency check before use. > >> > >> Great that QED_F_NEED_CHECK is now non-optional. However, suppose > >> we add a new structure (e.g. persistent freelist); it's now > >> impossible to tell whether the structure is updated or not: > >> > >> 1 new qemu opens image > >> 2 writes persistent freelist > >> 3 clears need_check > >> 4 shuts down > >> 5 old qemu opens image > >> 6 doesn't update persistent freelist > >> 7 clears need_check > >> 8 shuts down > >> > >> The image is now inconsistent, but has need_check cleared. > >> > >> We can address this by having a third feature bitmask that is > >> autocleared by guests that don't recognize various bits; so the > >> sequence becomes: > >> > >> 1 new qemu opens image > >> 2 writes persistent freelist > >> 3 clears need_check > >> 4 sets persistent_freelist > >> 5 shuts down > >> 6 old qemu opens image > >> 7 clears persistent_freelist (and any other bits it doesn't recognize) > >> 8 doesn't update persistent freelist > >> 9 clears need_check > >> 10 shuts down > >> > >> The image is now consistent, since the persistent freelist has disappeared. > > > >It is more complicated than just the feature bit. The freelist requires > >space in the image file. Clearing the persistent_freelist bit leaks the > >freelist. > > > >We can solve this by using a compat feature bit and an autoclear feature > >bit. The autoclear bit specifies whether or not the freelist is valid > >and the compat feature bit specifices whether or not the freelist > >exists. > > > >When the new qemu opens the image again it notices that the autoclear > >bit is unset but the compat bit is set. This means the freelist is > >out-of-date and its space can be reclaimed. > > > >I don't like the idea of doing these feature bit acrobatics because they > >add complexity. On the other hand your proposal ensures backward > >compatibility in the case of an optional data structure that needs to > >stay in sync with the image. I'm just not 100% convinced it's worth it. > > My scenario ends up with data corruption if we move to an old qemu > and then back again, without any aborts. > > A leak is acceptable (it won't grow; it's just an unused, incorrect > freelist), but data corruption is not. The alternative is for the freelist to be a non-compat feature bit. That means older QEMU binaries cannot use a QED image that has enabled the freelist. Stefan