From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=36450 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ov8wo-0006sq-6w for qemu-devel@nongnu.org; Mon, 13 Sep 2010 09:19:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Ov8wn-0001Yv-91 for qemu-devel@nongnu.org; Mon, 13 Sep 2010 09:19:50 -0400 Received: from mail-yw0-f45.google.com ([209.85.213.45]:36765) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Ov8wn-0001Yq-5P for qemu-devel@nongnu.org; Mon, 13 Sep 2010 09:19:49 -0400 Received: by ywg4 with SMTP id 4so2369530ywg.4 for ; Mon, 13 Sep 2010 06:19:48 -0700 (PDT) Message-ID: <4C8E24F1.7000402@codemonkey.ws> Date: Mon, 13 Sep 2010 08:19:45 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format References: <1283767478-16740-1-git-send-email-stefanha@linux.vnet.ibm.com> <4C84E738.3020802@codemonkey.ws> <4C865187.6090508@redhat.com> <4C865CFE.7010508@codemonkey.ws> <4C8663C4.1090508@redhat.com> <4C866773.2030103@codemonkey.ws> <4C86BC6B.5010809@codemonkey.ws> <4C874812.9090807@redhat.com> <4C87860A.3060904@codemonkey.ws> <4C888287.8020209@redhat.com> <4C88D7CC.5000806@codemonkey.ws> <4C8A1311.8070903@redhat.com> <4C8A2F40.7000509@codemonkey.ws> <4C8A36D4.5050001@redhat.com> <4C8A4707.7080705@codemonkey.ws> <4C8A5391.2030601@redhat.com> <4C8A65BB.9010602@codemonkey.ws> <4C8CD47E.4060309@redhat.com> <4C8CEE14.4020501@codemonkey.ws> <4C8CF812.4020203@redhat.com> <4C8D094E.4060507@codemonkey.ws> <4C8E0AF2.2090107@redhat.com> <4C8E0C4C.2060002@redhat.com> <4C8E0FA1.9060604@redhat.com> In-Reply-To: <4C8E0FA1.9060604@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel@nongnu.org, Avi Kivity , Stefan Hajnoczi On 09/13/2010 06:48 AM, Kevin Wolf wrote: > Am 13.09.2010 13:34, schrieb Avi Kivity: > >> On 09/13/2010 01:28 PM, Kevin Wolf wrote: >> >>> >>>> Anytime you grow the freelist with qcow2, you have to write a brand new >>>> freelist table and update the metadata synchronously to point to a new >>>> version of it. That means for a 1TB image, you're potentially writing >>>> out 128MB of data just to allocate a new cluster. >>>> >>> No. qcow2 has two-level tables. >>> >>> File size: 1 TB >>> Number of clusters: 1 TB / 64 kB = 16 M >>> Number of refcount blocks: (16 M * 2 B) / 64kB = 512 >>> Total size of all refcount blocks: 512 * 64kB = 32 MB >>> Size of recount table: 512 * 8 B = 4 kB >>> >>> When we grow an image file, the refcount blocks can stay where they are, >>> only the refcount table needs to be rewritten. So we have to copy a >>> total of 4 kB for growing the image file when it's 1 TB in size (all >>> assuming 64k clusters). >>> >>> The other result of this calculation is that we need to grow the >>> refcount table each time we cross a 16 TB boundary. So additionally to >>> being a small amount of data, it doesn't happen in practice anyway. >>> >> Interesting, I misremembered it as 8 bytes per cluster, not 2. So it's >> actually fairly dense (though still not as dense as a bitmap). >> > Yes, refcounts are 16 bit. Just checked it with the code once again to > be 100% sure. But if it was only that, it would be just a small factor. > The important part is that it's a two-level structure, so Anthony's > numbers are completely off. > A two-level structure makes growth more efficient, however, searching for a free cluster is still an expensive operation on large disk images. This is an important point because without snapshots, the argument for a refcount table is supporting UNMAP and efficient UNMAP support in qcow2 looks like it will require an additional structure. One of the troubles with qcow2 as a format is that the metadata on disk is redundant, it's already defined as authoritative. So while in QED, we can define the L1/L2 tables as the only authoritative source of information and treat a freelist as an optimization, the refcount table must remain authoritative in qcow2 in order to remain backwards compatible. You could rewrite the header to be qcow3 in order to relax this restriction but then you lose image mobility to older versions which really negates the advantage of not introducing a new format. Regards, Anthony Liguori Regards, Anthony Liguori > Kevin >