From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from [140.186.70.92] (port=50427 helo=eggs.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1Ontk5-00077e-Uv
	for qemu-devel@nongnu.org; Tue, 24 Aug 2010 09:40:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1Ontk1-0002Zl-PB
	for qemu-devel@nongnu.org; Tue, 24 Aug 2010 09:40:45 -0400
Received: from mail-iw0-f173.google.com ([209.85.214.173]:52088)
	by eggs.gnu.org with esmtp (Exim 4.69)
	(envelope-from <anthony@codemonkey.ws>) id 1Ontk1-0002Zf-Ld
	for qemu-devel@nongnu.org; Tue, 24 Aug 2010 09:40:41 -0400
Received: by iwn38 with SMTP id 38so2919862iwn.4
	for <qemu-devel@nongnu.org>; Tue, 24 Aug 2010 06:40:41 -0700 (PDT)
Message-ID: <4C73CBD6.7000900@codemonkey.ws>
Date: Tue, 24 Aug 2010 08:40:38 -0500
From: Anthony Liguori <anthony@codemonkey.ws>
MIME-Version: 1.0
References: <1282646430-5777-1-git-send-email-kwolf@redhat.com>
	<4C73C2BF.8050300@codemonkey.ws> <4C73C622.7080808@redhat.com>
	<4C73C926.3010901@codemonkey.ws> <4C73C9CF.7090800@redhat.com>
	<4C73CAA9.2060104@codemonkey.ws> <4C73CB85.9010306@redhat.com>
In-Reply-To: <4C73CB85.9010306@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use
 bdrv_(p)write_sync for metadata writes"
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Avi Kivity <avi@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, stefanha@gmail.com, mjt@tls.msk.ru, qemu-devel@nongnu.org, hch@lst.de

On 08/24/2010 08:39 AM, Avi Kivity wrote:
>  On 08/24/2010 04:35 PM, Anthony Liguori wrote:
>>> It's about metadata writes.  If an operation changes metadata, we 
>>> must sync it to disk before writing any data or other metadata which 
>>> depends on it, regardless of any promises to the guest.
>>
>>
>> Why?  If the metadata isn't sync, we loose the write.
>>
>> But that can happen anyway because we're not sync'ing the data
>>
>> We need to sync the metadata in the event of a guest initiated flush, 
>> but we shouldn't need to for a normal write.
>
> 1. Allocate a cluster (increase refcount table)
>
> 2. Link cluster to L2 table
>
> 3. Second operation makes it to disk; first still in pagecache
>
> 4. Crash
>
> 5. Dangling pointer from L2 to freed cluster

Yes, having this discussion in IRC.

The problem is that we maintain a refcount table.  If we didn't do 
internal disk snapshots, we wouldn't have this problem.  IOW, VMDK 
doesn't have this problem so the answer to my very first question is 
that qcow2 is too difficult a format to get right.

Regards,

Anthony Liguori