From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1M1y67-00068l-Tw
	for qemu-devel@nongnu.org; Thu, 07 May 2009 03:32:51 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1M1y63-00066W-5L
	for qemu-devel@nongnu.org; Thu, 07 May 2009 03:32:51 -0400
Received: from [199.232.76.173] (port=42955 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1M1y62-00066T-VQ
	for qemu-devel@nongnu.org; Thu, 07 May 2009 03:32:46 -0400
Received: from mx20.gnu.org ([199.232.41.8]:51640)
	by monty-python.gnu.org with esmtps (TLS-1.0:RSA_AES_256_CBC_SHA1:32)
	(Exim 4.60) (envelope-from <gleb@redhat.com>) id 1M1y62-0004al-2F
	for qemu-devel@nongnu.org; Thu, 07 May 2009 03:32:46 -0400
Received: from mx2.redhat.com ([66.187.237.31])
	by mx20.gnu.org with esmtp (Exim 4.60)
	(envelope-from <gleb@redhat.com>) id 1M1y60-0000MK-NZ
	for qemu-devel@nongnu.org; Thu, 07 May 2009 03:32:44 -0400
Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26])
	by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n477Wens022478
	for <qemu-devel@nongnu.org>; Thu, 7 May 2009 03:32:41 -0400
Date: Thu, 7 May 2009 10:32:37 +0300
From: Gleb Natapov <gleb@redhat.com>
Subject: Re: [Qemu-devel] Re: [PATCH] qcow2/virtio corruption: Don't
	allocate the same cluster twice
Message-ID: <20090507073237.GY9795@redhat.com>
References: <1241627950-22195-1-git-send-email-kwolf@redhat.com>
	<4A01C0C6.7020902@redhat.com> <4A01C2D4.5070000@redhat.com>
	<4A01C411.3060505@redhat.com> <4A01CE6C.3000901@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4A01CE6C.3000901@redhat.com>
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: markmc@redhat.com, Avi Kivity <avi@redhat.com>, qemu-devel@nongnu.org

On Wed, May 06, 2009 at 07:52:44PM +0200, Kevin Wolf wrote:
> Avi Kivity schrieb:
> > Kevin Wolf wrote:
> >> Avi Kivity schrieb:
> >>> Also, the second request now depends on the first to update its 
> >>> metadata.  But if the first request fail, it will not update its 
> >>> metadata, and the second request will complete without error and also 
> >>> without updating its metadata.
> >>>     
> >> Hm, right. Need to think about this...
> >>   
> > 
> > I suggest retaining the part where you use inflight l2metas to layout 
> > data contiguously, but change alloc_cluster_link_l2() not to rely on 
> > n_start and nb_available but instead recompute them on completion. 
> > m->nb_clusters should never be zeroed for this to work.
> 
> Is there even a reason why we need to copy the unmodified sectors in
> alloc_cluster_link_l2() and cannot do that in alloc_cluster_offset()
> before we write the new data? Then the callback wouldn't need to mess
> around with figuring out which part must be overwritten and which one
> mustn't.
> 
The reason we need to copy unmodified sectors in alloc_cluster_link_l2()
is exactly to handle concurrent writes into the same cluster. This is
essentially RMW. I don't see why concurrent writes should not work with
the logic in place. There is a bug there currently of cause :) Can
somebody check this patch:


diff --git a/block-qcow2.c b/block-qcow2.c
index 7840634..801d26d 100644
--- a/block-qcow2.c
+++ b/block-qcow2.c
@@ -995,8 +995,8 @@ static int alloc_cluster_link_l2(BlockDriverState *bs, uint64_t cluster_offset,
         if(l2_table[l2_index + i] != 0)
             old_cluster[j++] = l2_table[l2_index + i];
 
-        l2_table[l2_index + i] = cpu_to_be64((cluster_offset +
-                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED);
+        l2_table[l2_index + i] = cpu_to_be64(((cluster_offset +
+                    (i << s->cluster_bits)) | QCOW_OFLAG_COPIED));
      }
 
     if (bdrv_pwrite(s->hd, l2_offset + l2_index * sizeof(uint64_t),
@@ -1005,7 +1005,8 @@ static int alloc_cluster_link_l2(BlockDriverState *bs, uint64_t cluster_offset,
         goto err;
 
     for (i = 0; i < j; i++)
-        free_any_clusters(bs, be64_to_cpu(old_cluster[i]), 1);
+        free_any_clusters(bs, be64_to_cpu(old_cluster[i]) & ~QCOW_OFLAG_COPIED,
+                          1);
 
     ret = 0;
 err:

--
			Gleb.