From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=44916 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P5KK0-0005Sg-PJ for qemu-devel@nongnu.org; Mon, 11 Oct 2010 11:29:57 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P5Jpr-000855-Gr for qemu-devel@nongnu.org; Mon, 11 Oct 2010 10:58:44 -0400 Received: from mx1.redhat.com ([209.132.183.28]:51072) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P5Jpr-00084x-9Q for qemu-devel@nongnu.org; Mon, 11 Oct 2010 10:58:43 -0400 Message-ID: <4CB32615.6030008@redhat.com> Date: Mon, 11 Oct 2010 16:58:29 +0200 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification References: <1286552914-27014-1-git-send-email-stefanha@linux.vnet.ibm.com> <1286552914-27014-4-git-send-email-stefanha@linux.vnet.ibm.com> <4CB18549.3020206@redhat.com> <20101011100954.GA4078@stefan-thinkpad.transitives.com> <4CB30B43.2040706@redhat.com> <4CB32530.2070504@codemonkey.ws> In-Reply-To: <4CB32530.2070504@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Kevin Wolf , Anthony Liguori , Christoph Hellwig , Stefan Hajnoczi , qemu-devel@nongnu.org On 10/11/2010 04:54 PM, Anthony Liguori wrote: > On 10/11/2010 08:04 AM, Avi Kivity wrote: >> On 10/11/2010 12:09 PM, Stefan Hajnoczi wrote: >>> On Sun, Oct 10, 2010 at 11:20:09AM +0200, Avi Kivity wrote: >>> > On 10/08/2010 05:48 PM, Stefan Hajnoczi wrote: >>> > >Signed-off-by: Stefan Hajnoczi >>> > >--- >>> > > docs/specs/qed_spec.txt | 94 >>> +++++++++++++++++++++++++++++++++++++++++++++++ >>> > > 1 files changed, 94 insertions(+), 0 deletions(-) >>> > > >>> > >+Feature bits: >>> > >+* QED_F_BACKING_FILE = 0x01. The image uses a backing file. >>> > >+* QED_F_NEED_CHECK = 0x02. The image needs a consistency check >>> before use. >>> > >>> > Great that QED_F_NEED_CHECK is now non-optional. However, suppose >>> > we add a new structure (e.g. persistent freelist); it's now >>> > impossible to tell whether the structure is updated or not: >>> > >>> > 1 new qemu opens image >>> > 2 writes persistent freelist >>> > 3 clears need_check >>> > 4 shuts down >>> > 5 old qemu opens image >>> > 6 doesn't update persistent freelist >>> > 7 clears need_check >>> > 8 shuts down >>> > >>> > The image is now inconsistent, but has need_check cleared. >>> > >>> > We can address this by having a third feature bitmask that is >>> > autocleared by guests that don't recognize various bits; so the >>> > sequence becomes: >>> > >>> > 1 new qemu opens image >>> > 2 writes persistent freelist >>> > 3 clears need_check >>> > 4 sets persistent_freelist >>> > 5 shuts down >>> > 6 old qemu opens image >>> > 7 clears persistent_freelist (and any other bits it doesn't >>> recognize) >>> > 8 doesn't update persistent freelist >>> > 9 clears need_check >>> > 10 shuts down >>> > >>> > The image is now consistent, since the persistent freelist has >>> disappeared. >>> >>> It is more complicated than just the feature bit. The freelist >>> requires >>> space in the image file. Clearing the persistent_freelist bit leaks >>> the >>> freelist. >>> >>> We can solve this by using a compat feature bit and an autoclear >>> feature >>> bit. The autoclear bit specifies whether or not the freelist is valid >>> and the compat feature bit specifices whether or not the freelist >>> exists. >>> >>> When the new qemu opens the image again it notices that the autoclear >>> bit is unset but the compat bit is set. This means the freelist is >>> out-of-date and its space can be reclaimed. >>> >>> I don't like the idea of doing these feature bit acrobatics because >>> they >>> add complexity. On the other hand your proposal ensures backward >>> compatibility in the case of an optional data structure that needs to >>> stay in sync with the image. I'm just not 100% convinced it's worth >>> it. >> >> My scenario ends up with data corruption if we move to an old qemu >> and then back again, without any aborts. >> >> A leak is acceptable (it won't grow; it's just an unused, incorrect >> freelist), but data corruption is not. > > A leak is unacceptable. It means an image can grow to an unbounded > size. If you are a server provider offering multitenancy, then a > malicious guest can potentially grow the image beyond it's allotted > size causing a Denial of Service attack against another tenant. This particular leak cannot grow, and is not controlled by the guest. > A freelist has to be a non-optional feature. When the freelist bit is > set, an older QEMU cannot read the image. If the freelist is > completed used, the freelist bit can be cleared and the image is then > usable by older QEMUs. Once we support TRIM (or detect zeros) we'll never have a clean freelist. -- error compiling committee.c: too many arguments to function