From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from mailman by lists.gnu.org with tmda-scanned (Exim 4.43) id 1M1cb5-0001w4-RX for qemu-devel@nongnu.org; Wed, 06 May 2009 04:35:23 -0400 Received: from exim by lists.gnu.org with spam-scanned (Exim 4.43) id 1M1cb1-0001sE-Mu for qemu-devel@nongnu.org; Wed, 06 May 2009 04:35:23 -0400 Received: from [199.232.76.173] (port=44895 helo=monty-python.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1M1cb1-0001rz-Ir for qemu-devel@nongnu.org; Wed, 06 May 2009 04:35:19 -0400 Received: from mx2.redhat.com ([66.187.237.31]:34951) by monty-python.gnu.org with esmtp (Exim 4.60) (envelope-from ) id 1M1cb1-0008T7-4J for qemu-devel@nongnu.org; Wed, 06 May 2009 04:35:19 -0400 Received: from int-mx2.corp.redhat.com (int-mx2.corp.redhat.com [172.16.27.26]) by mx2.redhat.com (8.13.8/8.13.8) with ESMTP id n468ZI5f006980 for ; Wed, 6 May 2009 04:35:18 -0400 Message-ID: <4A014BA3.2080009@redhat.com> Date: Wed, 06 May 2009 11:34:43 +0300 From: Avi Kivity MIME-Version: 1.0 Subject: Re: [Qemu-devel] Strange virtio regression on mainline and stable-0.10 References: <4A000C74.5020907@redhat.com> <4A0066D9.6030008@redhat.com> <4A0146E3.2090909@redhat.com> In-Reply-To: <4A0146E3.2090909@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel Kevin Wolf wrote: > Avi Kivity schrieb: > >> Avi Kivity wrote: >> >>> Running the Fedora 10 installer on a virtio disk on current master and >>> on v0.10.3 will cause the installer to complain when mounting the >>> freshly formatted filesystems. >>> >> The problem is that qcow2 does a read-modify-write on >> non-cluster-aligned writes. So the following sequence triggers the bug: >> [...] >> >> This could be solved by maintaining a hash table of refcounted RMW >> copies for the disk. When reading for a RMW, look up the hash table, if >> there's a copy there, use it instead of reading it yourself. >> >> We should also avoid the RMW for non-compressed, non-encrypted clusters, >> as virtually ALL writes will be misaligned. >> > > I don't think there is a RMW except for the COW case which is > unavoidable and obviously happens only once for each cluster. Do you see > any other places where this happens? > No, I misread the code. I think the real problem is two parallel writes for the same cluster (but different sectors) started concurrently, so get_cluster_offset() places them in different clusters. When the second write completes we get unexpected results since the metadata now contains a block where on the start of the operation it was unallocated. We could probably get away with serializing only writes that hit the same cluster. A better approach may be to try to place parallel sequential writes contiguously in the allocating case. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic.