From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=41506 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1P5gwl-0003RU-Ot
	for qemu-devel@nongnu.org; Tue, 12 Oct 2010 11:39:24 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1P5gwk-0007cW-NW
	for qemu-devel@nongnu.org; Tue, 12 Oct 2010 11:39:23 -0400
Received: from mx1.redhat.com ([209.132.183.28]:32825)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <kwolf@redhat.com>) id 1P5gwk-0007cN-FS
	for qemu-devel@nongnu.org; Tue, 12 Oct 2010 11:39:22 -0400
Message-ID: <4CB48144.9030607@redhat.com>
Date: Tue, 12 Oct 2010 17:39:48 +0200
From: Kevin Wolf <kwolf@redhat.com>
MIME-Version: 1.0
References: <1286552914-27014-1-git-send-email-stefanha@linux.vnet.ibm.com>
	<1286552914-27014-7-git-send-email-stefanha@linux.vnet.ibm.com>
	<4CB479D2.7030901@redhat.com> <4CB47D38.3060602@linux.vnet.ibm.com>
In-Reply-To: <4CB47D38.3060602@linux.vnet.ibm.com>
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Anthony Liguori <aliguori@linux.vnet.ibm.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>, Avi Kivity <avi@redhat.com>, Christoph Hellwig <hch@lst.de>, Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>, qemu-devel@nongnu.org

Am 12.10.2010 17:22, schrieb Anthony Liguori:
> On 10/12/2010 10:08 AM, Kevin Wolf wrote:
>>   Otherwise we might destroy data that isn't
>> even touched by the guest request in case of a crash.
>>    
> 
> The failure scenarios are either that the cluster is leaked in which 
> case, the old version of the data is still present or the cluster is 
> orphaned because the L2 entry is written, in which case the old version 
> of the data is present.

Hm, how does the latter case work? Or rather, what do mean by "orphaned"?

> Are you referring to a scenario where the cluster is partially written 
> because the data is present in the write cache and the write cache isn't 
> flushed on power failure?

The case I'm referring to is a COW. So let's assume a partial write to
an unallocated cluster, we then need to do a COW in pre/postfill. Then
we do a normal write and link the new cluster in the L2 table.

Assume that the write to the L2 table is already on the disk, but the
pre/postfill data isn't yet. At this point we have a bad state because
if we crash now we have lost the data that should have been copied from
the backing file.

If we can't guarantee that a new cluster is all zeros, the same happens
without a backing file. So as soon as we start reusing freed clusters,
we get this case for all QED images.

Kevin