From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=39497 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1OoH5Q-0007FI-Ep for qemu-devel@nongnu.org; Wed, 25 Aug 2010 10:36:21 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1OoH5P-0002pM-DD for qemu-devel@nongnu.org; Wed, 25 Aug 2010 10:36:20 -0400 Received: from mx1.redhat.com ([209.132.183.28]:36016) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1OoH5P-0002p6-59 for qemu-devel@nongnu.org; Wed, 25 Aug 2010 10:36:19 -0400 Message-ID: <4C752A56.6060609@redhat.com> Date: Wed, 25 Aug 2010 17:36:06 +0300 From: Avi Kivity MIME-Version: 1.0 References: <1282646430-5777-1-git-send-email-kwolf@redhat.com> <4C73C2BF.8050300@codemonkey.ws> <4C73C622.7080808@redhat.com> <4C73C926.3010901@codemonkey.ws> <4C73C9CF.7090800@redhat.com> <4C73CAA9.2060104@codemonkey.ws> <4C73CB85.9010306@redhat.com> <4C73CBD6.7000900@codemonkey.ws> <4C73CCCB.6050704@redhat.com> <4C73CF8D.5060405@codemonkey.ws> <4C74C2F3.9050506@redhat.com> <4C7510C1.8080305@codemonkey.ws> <4C75195A.8050508@redhat.com> <4C751DBB.8060101@codemonkey.ws> <4C752211.5010600@redhat.com> <4C75252F.6040002@codemonkey.ws> In-Reply-To: <4C75252F.6040002@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: [Qemu-devel] Re: [RFC][STABLE 0.13] Revert "qcow2: Use bdrv_(p)write_sync for metadata writes" List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: Kevin Wolf , stefanha@gmail.com, mjt@tls.msk.ru, qemu-devel@nongnu.org, hch@lst.de On 08/25/2010 05:14 PM, Anthony Liguori wrote: >> >>> At a high level, I don't think online compaction requires any >>> specific support from an image format. >>> >> >> You need to know that the block is free and can be reallocated. > > > Semantically, TRIM/DISCARD means that "I don't care about the contents > of the block anymore until I do another write." Behind the scenes, we > can keep track of which blocks have been discarded in an in-memory > list whereas the first write to the block causes it to be evicted from > the discarded list. > > A background task would attempt to detect idle I/O and copy a block > from the end of the file to a location on the discarded list. When > the copy has completed, you can then remove the L2 entry for the > discarded block (effectively punching a hole in the image), sync, and > then update the l2 entry for the block at the end of file location to > point to the new block location. You can then ftruncate to reduce > overall file size. That should work. > > If you tried to maintain a free list, then you would need to sync on > TRIM/DISCARD which is potentially a fast path. While a background > task may be less efficient in the short term, it's just as efficient > in the long term and it has the advantage of keeping any fast path fast. > You only need to sync when the free list size grows beyond the amount of space you're prepared to lose on power fail. And you may be able to defer the background task indefinitely by satisfying new allocations from the free list. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain.