From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43)
	id 1M1cb5-0001w4-RX
	for qemu-devel@nongnu.org; Wed, 06 May 2009 04:35:23 -0400
Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43)
	id 1M1cb1-0001sE-Mu
	for qemu-devel@nongnu.org; Wed, 06 May 2009 04:35:23 -0400
Received: from [199.232.76.173] (port=44895 helo=monty-python.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.43) id 1M1cb1-0001rz-Ir
	for qemu-devel@nongnu.org; Wed, 06 May 2009 04:35:19 -0400
Received: from mx2.redhat.com ([66.187.237.31]:34951)
	by monty-python.gnu.org with esmtp (Exim 4.60)
	(envelope-from <avi@redhat.com>) id 1M1cb1-0008T7-4J
	for qemu-devel@nongnu.org; Wed, 06 May 2009 04:35:19 -0400
Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26])
	by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n468ZI5f006980
	for <qemu-devel@nongnu.org>; Wed, 6 May 2009 04:35:18 -0400
Message-ID: <4A014BA3.2080009@redhat.com>
Date: Wed, 06 May 2009 11:34:43 +0300
From: Avi Kivity <avi@redhat.com>
MIME-Version: 1.0
Subject: Re: [Qemu-devel] Strange virtio regression on mainline and stable-0.10
References: <4A000C74.5020907@redhat.com> <4A0066D9.6030008@redhat.com>
	<4A0146E3.2090909@redhat.com>
In-Reply-To: <4A0146E3.2090909@redhat.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
List-Id: qemu-devel.nongnu.org
List-Unsubscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/pipermail/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <http://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel <qemu-devel@nongnu.org>

Kevin Wolf wrote:
> Avi Kivity schrieb:
>   
>> Avi Kivity wrote:
>>     
>>> Running the Fedora 10 installer on a virtio disk on current master and 
>>> on v0.10.3 will cause the installer to complain when mounting the 
>>> freshly formatted filesystems.
>>>       
>> The problem is that qcow2 does a read-modify-write on 
>> non-cluster-aligned writes.  So the following sequence triggers the bug:
>> [...]
>>
>> This could be solved by maintaining a hash table of refcounted RMW 
>> copies for the disk.  When reading for a RMW, look up the hash table, if 
>> there's a copy there, use it instead of reading it yourself.
>>
>> We should also avoid the RMW for non-compressed, non-encrypted clusters, 
>> as virtually ALL writes will be misaligned.
>>     
>
> I don't think there is a RMW except for the COW case which is
> unavoidable and obviously happens only once for each cluster. Do you see
> any other places where this happens?
>   

No, I misread the code.  I think the real problem is two parallel writes 
for the same cluster (but different sectors) started concurrently, so 
get_cluster_offset() places them in different clusters.  When the second 
write completes we get unexpected results since the metadata now 
contains a block where on the start of the operation it was unallocated.

We could probably get away with serializing only writes that hit the 
same cluster.  A better approach may be to try to place parallel 
sequential writes contiguously in the allocating case.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.