From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=53533 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ov7J4-00063z-Ir for qemu-devel@nongnu.org; Mon, 13 Sep 2010 07:34:43 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Ov7DA-00013K-2C for qemu-devel@nongnu.org; Mon, 13 Sep 2010 07:28:37 -0400 Received: from mx1.redhat.com ([209.132.183.28]:60550) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Ov7D9-00013E-Qc for qemu-devel@nongnu.org; Mon, 13 Sep 2010 07:28:36 -0400 Message-ID: <4C8E0AF2.2090107@redhat.com> Date: Mon, 13 Sep 2010 13:28:50 +0200 From: Kevin Wolf MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format References: <1283767478-16740-1-git-send-email-stefanha@linux.vnet.ibm.com> <4C84E738.3020802@codemonkey.ws> <4C865187.6090508@redhat.com> <4C865CFE.7010508@codemonkey.ws> <4C8663C4.1090508@redhat.com> <4C866773.2030103@codemonkey.ws> <4C86BC6B.5010809@codemonkey.ws> <4C874812.9090807@redhat.com> <4C87860A.3060904@codemonkey.ws> <4C888287.8020209@redhat.com> <4C88D7CC.5000806@codemonkey.ws> <4C8A1311.8070903@redhat.com> <4C8A2F40.7000509@codemonkey.ws> <4C8A36D4.5050001@redhat.com> <4C8A4707.7080705@codemonkey.ws> <4C8A5391.2030601@redhat.com> <4C8A65BB.9010602@codemonkey.ws> <4C8CD47E.4060309@redhat.com> <4C8CEE14.4020501@codemonkey.ws> <4C8CF812.4020203@redhat.com> <4C8D094E.4060507@codemonkey.ws> In-Reply-To: <4C8D094E.4060507@codemonkey.ws> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Anthony Liguori Cc: qemu-devel@nongnu.org, Avi Kivity , Stefan Hajnoczi Am 12.09.2010 19:09, schrieb Anthony Liguori: > For a 1PB disk image with qcow2, the reference count table is 128GB. > For a 1TB image, the reference count table is 128MB. For a 128GB > image, the reference table is 16MB which is why we get away with it today. This is physical size. If you have a 1 PB disk, you're probably okay with using 128 GB of it for metadata (and I think it's less than that, see below) > Anytime you grow the freelist with qcow2, you have to write a brand new > freelist table and update the metadata synchronously to point to a new > version of it. That means for a 1TB image, you're potentially writing > out 128MB of data just to allocate a new cluster. No. qcow2 has two-level tables. File size: 1 TB Number of clusters: 1 TB / 64 kB = 16 M Number of refcount blocks: (16 M * 2 B) / 64kB = 512 Total size of all refcount blocks: 512 * 64kB = 32 MB Size of recount table: 512 * 8 B = 4 kB When we grow an image file, the refcount blocks can stay where they are, only the refcount table needs to be rewritten. So we have to copy a total of 4 kB for growing the image file when it's 1 TB in size (all assuming 64k clusters). The other result of this calculation is that we need to grow the refcount table each time we cross a 16 TB boundary. So additionally to being a small amount of data, it doesn't happen in practice anyway. Kevin