From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from [140.186.70.92] (port=55021 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1Ov8q5-0003IJ-BS for qemu-devel@nongnu.org; Mon, 13 Sep 2010 09:12:54 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.69) (envelope-from ) id 1Ov8q1-0000V1-5N for qemu-devel@nongnu.org; Mon, 13 Sep 2010 09:12:53 -0400 Received: from mail-gw0-f45.google.com ([74.125.83.45]:63066) by eggs.gnu.org with esmtp (Exim 4.69) (envelope-from ) id 1Ov8q1-0000Uv-37 for qemu-devel@nongnu.org; Mon, 13 Sep 2010 09:12:49 -0400 Received: by gwb11 with SMTP id 11so2381824gwb.4 for ; Mon, 13 Sep 2010 06:12:48 -0700 (PDT) Message-ID: <4C8E2348.7020100@codemonkey.ws> Date: Mon, 13 Sep 2010 08:12:40 -0500 From: Anthony Liguori MIME-Version: 1.0 Subject: Re: [Qemu-devel] [RFC] qed: Add QEMU Enhanced Disk format References: <1283767478-16740-1-git-send-email-stefanha@linux.vnet.ibm.com> <4C84E738.3020802@codemonkey.ws> <4C865187.6090508@redhat.com> <4C865CFE.7010508@codemonkey.ws> <4C8663C4.1090508@redhat.com> <4C866773.2030103@codemonkey.ws> <4C86BC6B.5010809@codemonkey.ws> <4C874812.9090807@redhat.com> <4C87860A.3060904@codemonkey.ws> <4C888287.8020209@redhat.com> <4C88D7CC.5000806@codemonkey.ws> <4C8A1311.8070903@redhat.com> <4C8A2F40.7000509@codemonkey.ws> <4C8A36D4.5050001@redhat.com> <4C8A4707.7080705@codemonkey.ws> <4C8A5391.2030601@redhat.com> <4C8A65BB.9010602@codemonkey.ws> <4C8CD47E.4060309@redhat.com> <4C8CEE14.4020501@codemonkey.ws> <4C8CF812.4020203@redhat.com> <4C8D094E.4060507@codemonkey.ws> <4C8E0AF2.2090107@redhat.com> In-Reply-To: <4C8E0AF2.2090107@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit List-Id: qemu-devel.nongnu.org List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel@nongnu.org, Avi Kivity , Stefan Hajnoczi On 09/13/2010 06:28 AM, Kevin Wolf wrote: >> Anytime you grow the freelist with qcow2, you have to write a brand new >> freelist table and update the metadata synchronously to point to a new >> version of it. That means for a 1TB image, you're potentially writing >> out 128MB of data just to allocate a new cluster. >> > No. qcow2 has two-level tables. > > File size: 1 TB > Number of clusters: 1 TB / 64 kB = 16 M > Number of refcount blocks: (16 M * 2 B) / 64kB = 512 > Total size of all refcount blocks: 512 * 64kB = 32 MB > Size of recount table: 512 * 8 B = 4 kB > > When we grow an image file, the refcount blocks can stay where they are, > only the refcount table needs to be rewritten. So we have to copy a > total of 4 kB for growing the image file when it's 1 TB in size (all > assuming 64k clusters). > Yes, I misread the code. It is a two level table. Even though it's 4x smaller than I previously stated, it's still quite large and finding a free block is an O(n) operation where n is the physical file size. An fsck() on qed is also an O(n) operation where n is the physical file size so I still contend the two are similar in cost. Regards, Anthony Liguori > The other result of this calculation is that we need to grow the > refcount table each time we cross a 16 TB boundary. So additionally to > being a small amount of data, it doesn't happen in practice anyway. > > Kevin >