From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([140.186.70.92]:33075) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QbUkh-00035j-2E for qemu-devel@nongnu.org; Tue, 28 Jun 2011 05:38:40 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1QbUkf-00022V-3u for qemu-devel@nongnu.org; Tue, 28 Jun 2011 05:38:38 -0400 Received: from mail-yx0-f173.google.com ([209.85.213.173]:51588) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1QbUke-00022N-P3 for qemu-devel@nongnu.org; Tue, 28 Jun 2011 05:38:36 -0400 Received: by yxt3 with SMTP id 3so8685yxt.4 for ; Tue, 28 Jun 2011 02:38:35 -0700 (PDT) MIME-Version: 1.0 In-Reply-To: <1309187514-26562-1-git-send-email-kwolf@redhat.com> References: <1309187514-26562-1-git-send-email-kwolf@redhat.com> Date: Tue, 28 Jun 2011 11:38:35 +0200 Message-ID: From: Frediano Ziglio Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [RFC PATCH v2] Specification for qcow2 version 3 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: ctang@us.ibm.com, stefanha@gmail.com, hch@lst.de, qemu-devel@nongnu.org, avi@redhat.com 2011/6/27 Kevin Wolf : > This is the second draft for what I think could be added when we increase= qcow2's > version number to 3. This includes points that have been made by several = people > over the past few months. We're probably not going to implement this next= week, > but I think it's important to get discussions started early, so here it i= s. > > Changes implemented in this RFC: > > - Added compatible/incompatible/auto-clear feature bits plus an optional > =C2=A0feature name table to allow useful error messages even if an older = version > =C2=A0doesn't know some feature at all. > > - Added a dirty flag which tells that the refcount may not be accurate ("= QED > =C2=A0mode"). This means that we can save writes to the refcount table wi= th > =C2=A0cache=3Dwritethrough, but isn't really useful otherwise since Qcow2= Cache. > > - Configurable refcount width. If you don't want to use internal snapshot= s, > =C2=A0make refcounts one bit and save cache space and I/O. > > - Added subclusters. This separate the COW size (one subcluster, I'm thin= king > =C2=A0of 64k default size here) from the allocation size (one cluster, 2M= ). Less > =C2=A0fragmentation, less metadata, but still reasonable COW granularity. > > =C2=A0This also allows to preallocate clusters, but none of their subclus= ters. You > =C2=A0can have an image that is like raw + COW metadata, and you can also > =C2=A0preallocate metadata for images with backing files. > > - Zero cluster flags. This allows discard even with a backing file that d= oesn't > =C2=A0contain zeros. It is also useful for copy-on-read/image streaming, = as you'll > =C2=A0want to keep sparseness without accessing the remote image for an u= nallocated > =C2=A0cluster all the time. > > - Fixed internal snapshot metadata to use 64 bit VM state size. You can't= save > =C2=A0a snapshot of a VM with >=3D 4 GB RAM today. > > Possible future additions: > > - Add per-L2-table dirty flag to L1? > - Add per-refcount-block full flag to refcount table? Hi, thinking about image improvement I would add - GUID for image and backing file - relative path for backing file This would help finding images in a distributed environment or if file are moved, ie: gfs/nfs/ocfs mounted in different mount points, backing used a template in a different images directory and move this directory somewhere else. Also with GUID a possible higher level could manage a GUID <-> file image db. I was also think about a "backing file length" field to support resizing but probably can be implemented with zero cluster. Assume you have a image of 5gb, create a new image with first image as backing one, now resize second image from 5gb to 3gb then resize it again (after some works) to 10gb, part from 3gb to 5gb should not be read from backing file. Also a bit in l2 offset to say "there is no l2 table" cause all clusters in l2 are contiguous so we avoid entirely l2. Obviously this require an optimization step to detect or create such condition. For check perhaps it would be helpful to save not only a flag but also a size where data are ok (for instance already allocated and with refcount saved correctly). A possible optimization for refcount would be to initialize refcount to 1 instead of 0. When clusters are allocated at end-of-file this would not require refcount change and would be easy to check file size to see which clusters are marked as allocated but not present. Fields for sectors and heads to support old CHS systems ?? This mail sound quite strange to me, I thought qed would be the future of qcow2 but I must be really wrong. I think a big limit for current qed and qcow2 implementation is the serialization of metadata informations (qcow2 use synchronous operation while qed use a queue). I used bonnie++ program to test speed and performances allocating data is about 15-20% of allocated one. I'm working (in the few spare time I have) improving it. VirtualBox and ESX use large clusters (1mb) to mitigate allocation/metadata problem. Perhaps raising default cluster size would help changing a spread idea of bad qemu i/o performance. Regards Frediano Ziglio