From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=52831 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OntYz-0000I4-V3 for qemu-devel@nongnu.org; Tue, 24 Aug 2010 09:29:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OntYv-0000yi-Fc for qemu-devel@nongnu.org; Tue, 24 Aug 2010 09:29:17 -0400 Received: from mail-gy0-f173.google.com ([209.85.160.173]:34628) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OntYv-0000yU-D2 for qemu-devel@nongnu.org; Tue, 24 Aug 2010 09:29:13 -0400 Received: by gya1 with SMTP id 1so23852gya.4 for ; Tue, 24 Aug 2010 06:29:12 -0700 (PDT) Message-ID: <4C73C926.3010901@codemonkey.ws> Date: Tue, 24 Aug 2010 08:29:10 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <1282646430-5777-1-git-send-email-kwolf@redhat.com> <4C73C2BF.8050300@codemonkey.ws> <4C73C622.7080808@redhat.com> In-Reply-To: <4C73C622.7080808@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes" List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: stefanha@gmail.com, avi@redhat.com, mjt@tls.msk.ru, qemu-devel@nongnu.org, hch@lst.de On 08/24/2010 08:16 AM, Kevin Wolf wrote: > Am 24.08.2010 15:01, schrieb Anthony Liguori: > >> On 08/24/2010 05:40 AM, Kevin Wolf wrote: >> >>> This reverts commit 8b3b720620a1137a1b794fc3ed64734236f94e06. >>> >>> This fix has caused severe slowdowns on recent kernels that actually do flush >>> when they are told so. Reverting this patch hurts correctness and means that we >>> could get corrupted images in case of a host crash. This means that qcow2 might >>> not be an option for some people without this fix. On the other hand, I get >>> reports that the slowdown is so massive that not reverting it would mean that >>> people can't use it either because it just takes ages to complete stuff. It >>> probably can be fixed, but not in time for 0.13.0. >>> >>> Usually, if there's a possible tradeoff between correctness and performance, I >>> tend to choose correctness, but I'm not so sure in this case. I'm not sure with >>> reverting either, which is why I post this as an RFC only. >>> >>> I hope to get some more comments on how to proceed here for 0.13. >>> >>> >> How fundamental of an issue is this? Is this something we think we know >> how to fix and we just don't think there's time to fix it for 0.13? >> > I think we can improve things basically by trying to batch metadata > writes and do them in parallel while already processing the next requests. > > I'm not sure what the numbers are going to look like with something like > this in place, I need to try it. It's definitely not something that I > want to go into 0.13 at this point. > I'm not sure this patch is needed in the first place. If you have a sequence of operations like: 0) receive guest write request Z 1) submit write A 2) write A completes 3) submit write B 4) write B completes 5) report guest write Z complete You're adding a: 4.5) sync write B Which is ultimately unnecessary if what you care about is avoiding reordering of step (2) and (4). When a write() request completes, you're guaranteed that a subsequent read() request will return the written data. That's always true. If I could do a write(A) followed by a write(B) and then read()=A, no software would actually function correctly. It's important to make sure that you don't get image corruption if (2) happens but not (4). But I think that's okay in qcow2 today. Regards, Anthony Liguori >> And with this fix in place, how much confidence do we have in qcow2 with >> respect to data integrity on power loss? >> > How do you measure confidence? :-) > > There are power failure tests and I don't have any bugs open to that > respect. I'm not sure how intensively it's tested, though. > > >> We've shipped every version of QEMU since qcow2 was introduced with >> known data corruptions. It sucks big time. I think it's either that >> building an image format is a really hard problem akin to making a file >> system and we shouldn't be in that business or that qcow2 is bad as an >> image format which makes this all harder than it should be. >> > I tend to say that it's just hard to get right. Most of the problems > that were fixed in qcow2 over the last year are probably present in our > VMDK implementation as well, just to pick one example. > > Kevin >