From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=54956 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OoFNn-0001Zh-UH for qemu-devel@nongnu.org; Wed, 25 Aug 2010 08:47:18 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OoFNe-0007k9-Iv for qemu-devel@nongnu.org; Wed, 25 Aug 2010 08:47:03 -0400 Received: from mail-qw0-f45.google.com ([209.85.216.45]:56838) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OoFNe-0007jD-Gl for qemu-devel@nongnu.org; Wed, 25 Aug 2010 08:47:02 -0400 Received: by qwh5 with SMTP id 5so459015qwh.4 for ; Wed, 25 Aug 2010 05:46:58 -0700 (PDT) Message-ID: <4C7510C1.8080305@codemonkey.ws> Date: Wed, 25 Aug 2010 07:46:57 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <1282646430-5777-1-git-send-email-kwolf@redhat.com> <4C73C2BF.8050300@codemonkey.ws> <4C73C622.7080808@redhat.com> <4C73C926.3010901@codemonkey.ws> <4C73C9CF.7090800@redhat.com> <4C73CAA9.2060104@codemonkey.ws> <4C73CB85.9010306@redhat.com> <4C73CBD6.7000900@codemonkey.ws> <4C73CCCB.6050704@redhat.com> <4C73CF8D.5060405@codemonkey.ws> <4C74C2F3.9050506@redhat.com> In-Reply-To: <4C74C2F3.9050506@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes" List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Kevin Wolf , stefanha@gmail.com, mjt@tls.msk.ru, qemu-devel@nongnu.org, hch@lst.de On 08/25/2010 02:14 AM, Avi Kivity wrote: >> If (c) happens before (b), then we've created an extent that's >> attached to a table with a zero reference count. This is a corrupt >> image. >> > > > If the only issue is new block allocation, it can be easily solved. Technically, I believe there are similar issues around creating snapshots but I don't think we care. > Instead of allocating exactly the needed amount of blocks, allocate > a large extent and hold them in memory. So you're suggesting that we allocate a bunch of blocks, update the ref count table so that they are seen as allocated even though they aren't attached to an l1 table? > The next allocation can then be filled from memory, so the > allocation sync is amortized over many blocks. A power fail will leak > the preallocated blocks, losing some megabytes of address space, but > not real disk space. It's a clever idea, but it would lose real disk space which is probably not a huge issue. >> Let's consider if we eliminate the reference count table which means >> eliminating internal snapshots. >> >> 1) guest submits write request >> 2) allocate extent >> 3) write data to disk (a) >> 4) write (a) completes >> 5) write extent table (c) >> 6) write (c) completes >> 7) complete guest write request >> >> If this all happens in order and we lose power, we just leak a >> block. It means we need a periodic fsck. >> >> If (c) completes before (a), then it means that the image is not >> corrupted but data gets lost. This is okay based on the guest contract. >> >> And that's it. There is no scenario where the disk is corrupted. > > _if_ that's the only failure mode. If we had another disk format that only supported growth and metadata for a backing file, can you think of another failure scenario? Regards, Anthony Liguori