From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=52639 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OoHnS-0002SG-0u for qemu-devel@nongnu.org; Wed, 25 Aug 2010 11:21:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OoHnQ-0002T7-QY for qemu-devel@nongnu.org; Wed, 25 Aug 2010 11:21:49 -0400 Received: from mail-qy0-f180.google.com ([209.85.216.180]:55059) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OoHnQ-0002T1-NM for qemu-devel@nongnu.org; Wed, 25 Aug 2010 11:21:48 -0400 Received: by qyk31 with SMTP id 31so680916qyk.4 for ; Wed, 25 Aug 2010 08:21:48 -0700 (PDT) Message-ID: <4C75350B.9010501@codemonkey.ws> Date: Wed, 25 Aug 2010 10:21:47 -0500 From: Anthony Liguori MIME-Version: 1.0 References: <1282646430-5777-1-git-send-email-kwolf@redhat.com> <4C73C2BF.8050300@codemonkey.ws> <4C73C622.7080808@redhat.com> <4C73C926.3010901@codemonkey.ws> <4C73C9CF.7090800@redhat.com> <4C73CAA9.2060104@codemonkey.ws> <4C73CB85.9010306@redhat.com> <4C73CBD6.7000900@codemonkey.ws> <4C73CCCB.6050704@redhat.com> <4C73CF8D.5060405@codemonkey.ws> <4C74C2F3.9050506@redhat.com> <4C7510C1.8080305@codemonkey.ws> <4C75195A.8050508@redhat.com> <4C751DBB.8060101@codemonkey.ws> <4C752211.5010600@redhat.com> <4C75252F.6040002@codemonkey.ws> <4C752A56.6060609@redhat.com> <4C753171.2050405@codemonkey.ws> <4C7533A7.7090404@redhat.com> In-Reply-To: <4C7533A7.7090404@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes" List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Avi Kivity Cc: Kevin Wolf , stefanha@gmail.com, mjt@tls.msk.ru, qemu-devel@nongnu.org, hch@lst.de On 08/25/2010 10:15 AM, Avi Kivity wrote: > On 08/25/2010 06:06 PM, Anthony Liguori wrote: >> On 08/25/2010 09:36 AM, Avi Kivity wrote: >>>> >>>> If you tried to maintain a free list, then you would need to sync >>>> on TRIM/DISCARD which is potentially a fast path. While a >>>> background task may be less efficient in the short term, it's just >>>> as efficient in the long term and it has the advantage of keeping >>>> any fast path fast. >>>> >>> >>> You only need to sync when the free list size grows beyond the >>> amount of space you're prepared to lose on power fail. And you may >>> be able to defer the background task indefinitely by satisfying new >>> allocations from the free list. >> >> Free does not mean free. If you immediately punch a hole in the l2 >> without doing a sync, then you're never sure whether the hole is >> there on disk or not. So if you then allocate that block and put it >> somewhere else in another l2 table, you need to sync the previous l2 >> change before you update the new l2. >> >> Otherwise you can have two l2 entries pointing to the same block >> after a power failure. That's not a leak, that's a data corruption. > > L2 certainly needs to be updated before the block is reused. But > that's not different from a file format without a free list. > > The batching I was referring to was only for free list management, > same as the allocation issue which started this thread. Okay, yes, you can orphan blocks pro-actively and then use them for future allocations instead of doing active defragmentation. I'm still not sure a free list stored in the format is all that useful but you could certainly implement it lazily. Regards, Anthony Liguori