From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=46333 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P5KKp-0005dg-Ol for qemu-devel@nongnu.org; Mon, 11 Oct 2010 11:30:47 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P5Jm0-00071x-UR for qemu-devel@nongnu.org; Mon, 11 Oct 2010 10:54:46 -0400 Received: from mail-gw0-f45.google.com ([74.125.83.45]:41486) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P5Jm0-00071b-Rr for qemu-devel@nongnu.org; Mon, 11 Oct 2010 10:54:44 -0400 Received: by gwj19 with SMTP id 19so782745gwj.4 for ; Mon, 11 Oct 2010 07:54:43 -0700 (PDT) Message-ID: <4CB32530.2070504@codemonkey.ws> Date: Mon, 11 Oct 2010 09:54:40 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification References: <1286552914-27014-1-git-send-email-stefanha@linux.vnet.ibm.com> <1286552914-27014-4-git-send-email-stefanha@linux.vnet.ibm.com> <4CB18549.3020206@redhat.com> <20101011100954.GA4078@stefan-thinkpad.transitives.com> <4CB30B43.2040706@redhat.com> In-Reply-To: <4CB30B43.2040706@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Kevin Wolf , Anthony Liguori , Christoph Hellwig , Stefan Hajnoczi , qemu-devel@nongnu.org On 10/11/2010 08:04 AM, Avi Kivity wrote: > On 10/11/2010 12:09 PM, Stefan Hajnoczi wrote: >> On Sun, Oct 10, 2010 at 11:20:09AM +0200, Avi Kivity wrote: >> > On 10/08/2010 05:48 PM, Stefan Hajnoczi wrote: >> > >Signed-off-by: Stefan Hajnoczi >> > >--- >> > > docs/specs/qed_spec.txt | 94 >> +++++++++++++++++++++++++++++++++++++++++++++++ >> > > 1 files changed, 94 insertions(+), 0 deletions(-) >> > > >> > >+Feature bits: >> > >+* QED_F_BACKING_FILE = 0x01. The image uses a backing file. >> > >+* QED_F_NEED_CHECK = 0x02. The image needs a consistency check >> before use. >> > >> > Great that QED_F_NEED_CHECK is now non-optional. However, suppose >> > we add a new structure (e.g. persistent freelist); it's now >> > impossible to tell whether the structure is updated or not: >> > >> > 1 new qemu opens image >> > 2 writes persistent freelist >> > 3 clears need_check >> > 4 shuts down >> > 5 old qemu opens image >> > 6 doesn't update persistent freelist >> > 7 clears need_check >> > 8 shuts down >> > >> > The image is now inconsistent, but has need_check cleared. >> > >> > We can address this by having a third feature bitmask that is >> > autocleared by guests that don't recognize various bits; so the >> > sequence becomes: >> > >> > 1 new qemu opens image >> > 2 writes persistent freelist >> > 3 clears need_check >> > 4 sets persistent_freelist >> > 5 shuts down >> > 6 old qemu opens image >> > 7 clears persistent_freelist (and any other bits it doesn't >> recognize) >> > 8 doesn't update persistent freelist >> > 9 clears need_check >> > 10 shuts down >> > >> > The image is now consistent, since the persistent freelist has >> disappeared. >> >> It is more complicated than just the feature bit. The freelist requires >> space in the image file. Clearing the persistent_freelist bit leaks the >> freelist. >> >> We can solve this by using a compat feature bit and an autoclear feature >> bit. The autoclear bit specifies whether or not the freelist is valid >> and the compat feature bit specifices whether or not the freelist >> exists. >> >> When the new qemu opens the image again it notices that the autoclear >> bit is unset but the compat bit is set. This means the freelist is >> out-of-date and its space can be reclaimed. >> >> I don't like the idea of doing these feature bit acrobatics because they >> add complexity. On the other hand your proposal ensures backward >> compatibility in the case of an optional data structure that needs to >> stay in sync with the image. I'm just not 100% convinced it's worth it. > > My scenario ends up with data corruption if we move to an old qemu and > then back again, without any aborts. > > A leak is acceptable (it won't grow; it's just an unused, incorrect > freelist), but data corruption is not. A leak is unacceptable. It means an image can grow to an unbounded size. If you are a server provider offering multitenancy, then a malicious guest can potentially grow the image beyond it's allotted size causing a Denial of Service attack against another tenant. A freelist has to be a non-optional feature. When the freelist bit is set, an older QEMU cannot read the image. If the freelist is completed used, the freelist bit can be cleared and the image is then usable by older QEMUs. Regards, Anthony Liguori