From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=47789 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P61CJ-0005mb-BR for qemu-devel@nongnu.org; Wed, 13 Oct 2010 09:16:53 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P612l-0004w6-DP for qemu-devel@nongnu.org; Wed, 13 Oct 2010 09:06:56 -0400 Received: from mx1.redhat.com ([209.132.183.28]:1163) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P612l-0004vf-7G for qemu-devel@nongnu.org; Wed, 13 Oct 2010 09:06:55 -0400 Message-ID: <4CB5AF0D.9000800@redhat.com> Date: Wed, 13 Oct 2010 15:07:25 +0200 From: Kevin Wolf MIME-Version: 1.0 Subject: Re: [Qemu-devel] Re: [PATCH v2 6/7] qed: Read/write support References: <1286552914-27014-1-git-send-email-stefanha@linux.vnet.ibm.com> <1286552914-27014-7-git-send-email-stefanha@linux.vnet.ibm.com> <4CB479D2.7030901@redhat.com> <4CB47D38.3060602@linux.vnet.ibm.com> <4CB48144.9030607@redhat.com> <20101012155953.GA13872@stefan-thinkpad.transitives.com> <4CB489D1.3050204@linux.vnet.ibm.com> <20101013121328.GB8998@stefan-thinkpad.transitives.com> In-Reply-To: <20101013121328.GB8998@stefan-thinkpad.transitives.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Stefan Hajnoczi Cc: Anthony Liguori , Christoph Hellwig , Avi Kivity , qemu-devel@nongnu.org Am 13.10.2010 14:13, schrieb Stefan Hajnoczi: > On Tue, Oct 12, 2010 at 11:16:17AM -0500, Anthony Liguori wrote: >> On 10/12/2010 10:59 AM, Stefan Hajnoczi wrote: >>> On Tue, Oct 12, 2010 at 05:39:48PM +0200, Kevin Wolf wrote: >>>> Am 12.10.2010 17:22, schrieb Anthony Liguori: >>>>> On 10/12/2010 10:08 AM, Kevin Wolf wrote: >>>>>> Otherwise we might destroy data that isn't >>>>>> even touched by the guest request in case of a crash. >>>>>> >>>>> The failure scenarios are either that the cluster is leaked in which >>>>> case, the old version of the data is still present or the cluster is >>>>> orphaned because the L2 entry is written, in which case the old version >>>>> of the data is present. >>>> Hm, how does the latter case work? Or rather, what do mean by "orphaned"? >>>> >>>>> Are you referring to a scenario where the cluster is partially written >>>>> because the data is present in the write cache and the write cache isn't >>>>> flushed on power failure? >>>> The case I'm referring to is a COW. So let's assume a partial write to >>>> an unallocated cluster, we then need to do a COW in pre/postfill. Then >>>> we do a normal write and link the new cluster in the L2 table. >>>> >>>> Assume that the write to the L2 table is already on the disk, but the >>>> pre/postfill data isn't yet. At this point we have a bad state because >>>> if we crash now we have lost the data that should have been copied from >>>> the backing file. >>> In this case QED_F_NEED_CHECK is set and the invalid cluster offset >>> should be reset to zero on open. >>> >>> However, I think we can get into a state where the pre/postfill data >>> isn't on the disk yet but another allocation has increased the file >>> size, making the unwritten cluster "valid". This fools consistency >>> check into thinking the data cluster (which was never written to on >>> disk) is valid. >>> >>> Will think about this more tonight. >> >> It's fairly simple to add a sync to this path. It's probably worth >> checking the prefill/postfill for zeros and avoiding the write/sync >> if that's the case. That should optimize the common cases of >> allocating new space within a file. >> >> My intuition is that we can avoid the sync entirely but we'll need >> to think about it further. > > We can avoid it when a backing image is not used. Your idea to check > for zeroes in the backing image is neat too, it may well reduce the > common case even for backing images. The additional requirement is that we're extending the file and not reusing an old cluster. (And bdrv_has_zero_init() == true, but QED doesn't work on host_devices anyway) Kevin