From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:53647) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aQxkY-00033k-1M for qemu-devel@nongnu.org; Wed, 03 Feb 2016 08:45:40 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aQxkT-0004ub-Oo for qemu-devel@nongnu.org; Wed, 03 Feb 2016 08:45:37 -0500 Received: from mx2.parallels.com ([199.115.105.18]:59043) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aQxkT-0004uK-FA for qemu-devel@nongnu.org; Wed, 03 Feb 2016 08:45:33 -0500 Message-ID: <56B20471.6090804@virtuozzo.com> Date: Wed, 3 Feb 2016 16:45:21 +0300 From: Vladimir Sementsov-Ogievskiy MIME-Version: 1.0 References: <1454394900-3586-1-git-send-email-vsementsov@virtuozzo.com> <20160203080401.GB25746@ad.usersys.redhat.com> In-Reply-To: <20160203080401.GB25746@ad.usersys.redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH v9] spec: add qcow2 bitmaps extension specification List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Fam Zheng Cc: kwolf@redhat.com, qemu-devel@nongnu.org, stefanha@redhat.com, den@openvz.org, jsnow@redhat.com On 03.02.2016 11:04, Fam Zheng wrote: > On Tue, 02/02 09:35, Vladimir Sementsov-Ogievskiy wrote: >> The new feature for qcow2: storing bitmaps. >> >> This patch adds new header extension to qcow2 - Bitmaps Extension. It >> provides an ability to store virtual disk related bitmaps in a qcow2 >> image. For now there is only one type of such bitmaps: Dirty Tracking >> Bitmap, which just tracks virtual disk changes from some moment. >> >> Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file, >> should be stored in this qcow2 file. The size of each bitmap >> (considering its granularity) is equal to virtual disk size. >> >> Signed-off-by: Vladimir Sementsov-Ogievskiy >> --- >> >> v9 >> - rewordings, thanks to Max >> >> v8 >> - rewordings >> - bitmap_directory_size: 4b -> 8b >> - add more descriptive description in == Bitmaps == section >> - add paragraph "Dirty tracking bitmaps" >> >> Bitmap directory entry: >> - extra data should not allocate additional clusters >> - padding must be all-bytes-zero >> - add extra_data_compatible flag (now behavior in case of unknown >> extra data is defined by this flag) >> >> v7: >> >> - Rewordings, grammar. >> Max, Eric, John, thank you very much. >> >> - add last paragraph: remaining bits in bitmap data clusters must be >> zero. >> >> - s/Bitmap Directory/bitmap directory/ and other names like this at >> the request of Max. >> >> v6: >> >> - reword bitmap_directory_size description >> - bitmap type: make 0 reserved >> - extra_data_size: resize to 4bytes >> Also, I've marked this field as "must be zero". We can always change >> it, if we decide allowing managing app to specify any extra data, by >> defining some magic value as a top of user extra data.. So, for now >> non zeor extra_data_size should be considered as an error. >> - swap name and extra_data to give good alignment to extra_data. >> >> >> v5: >> >> - 'Dirty bitmaps' renamed to 'Bitmaps', as we may have several types of >> bitmaps. >> - rewordings >> - move upper bounds to "Notes about Qemu limits" >> - s/should/must somewhere. (but not everywhere) >> - move name_size field closer to name itself in bitmap header >> - add extra data area to bitmap header >> - move bitmap data description to separate section >> >> >> docs/specs/qcow2.txt | 223 ++++++++++++++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 222 insertions(+), 1 deletion(-) >> >> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt >> index f236d8c..db5e666 100644 >> --- a/docs/specs/qcow2.txt >> +++ b/docs/specs/qcow2.txt >> @@ -103,7 +103,18 @@ in the description of a field. >> write to an image with unknown auto-clear features if it >> clears the respective bits from this field first. >> >> - Bits 0-63: Reserved (set to 0) >> + Bit 0: Bitmaps extension bit >> + This bit indicates consistency for the bitmaps >> + extension data. >> + >> + It is an error if this bit is set without the >> + bitmaps extension present. >> + >> + If the bitmaps extension is present but this >> + bit is unset, the bitmaps extension data must be >> + considered inconsistent. >> + >> + Bits 1-63: Reserved (set to 0) >> >> 96 - 99: refcount_order >> Describes the width of a reference count block entry (width >> @@ -123,6 +134,7 @@ be stored. Each extension has a structure like the following: >> 0x00000000 - End of the header extension area >> 0xE2792ACA - Backing file format name >> 0x6803f857 - Feature name table >> + 0x23852875 - Bitmaps extension >> other - Unknown header extension, can be safely >> ignored >> >> @@ -166,6 +178,36 @@ the header extension data. Each entry look like this: >> terminated if it has full length) >> >> >> +== Bitmaps extension == >> + >> +The bitmaps extension is an optional header extension. It provides the ability >> +to store bitmaps related to a virtual disk. For now, there is only one bitmap >> +type: the dirty tracking bitmap, which tracks virtual disk changes from some >> +point in time. >> + >> +The data of the extension should be considered consistent only if the >> +corresponding auto-clear feature bit is set, see autoclear_features above. >> + >> +The fields of the bitmaps extension are: >> + >> + Byte 0 - 3: nb_bitmaps >> + The number of bitmaps contained in the image. Must be >> + greater than or equal to 1. >> + >> + Note: Qemu currently only supports up to 65535 bitmaps per >> + image. >> + >> + 4 - 7: Reserved, must be zero. >> + >> + 8 - 15: bitmap_directory_size >> + Size of the bitmap directory in bytes. It is the cumulative >> + size of all (nb_bitmaps) bitmap headers. >> + >> + 16 - 23: bitmap_directory_offset >> + Offset into the image file at which the bitmap directory >> + starts. Must be aligned to a cluster boundary. >> + >> + >> == Host cluster management == >> >> qcow2 manages the allocation of host clusters by maintaining a reference count >> @@ -360,3 +402,182 @@ Snapshot table entry: >> >> variable: Padding to round up the snapshot table entry size to the >> next multiple of 8. >> + >> + >> +== Bitmaps == >> + >> +As mentioned above, the bitmaps extension provides the ability to store bitmaps >> +related to a virtual disk. This section describes how these bitmaps are stored. >> + >> +All stored bitmaps are related to the virtual disk stored in the same image, so >> +each bitmap size is equal to the virtual disk size. >> + >> +Each bit of the bitmap is responsible for strictly defined range of the virtual >> +disk. For bit number bit_nr the corresponding range (in bytes) will be: >> + >> + [bit_nr * bitmap_granularity .. (bit_nr + 1) * bitmap_granularity - 1] >> + >> +Granularity is a property of the concrete bitmap, see below. >> + >> + >> +=== Bitmap directory === >> + >> +Each bitmap saved in the image is described in a bitmap directory entry. The >> +bitmap directory is a contiguous area in the image file, whose starting offset >> +and length are given by the header extension fields bitmap_directory_offset and >> +bitmap_directory_size. The entries of the bitmap directory have variable >> +length, depending on the length of the bitmap name and extra data. These > s/length/lengths/ ? ok > >> +entries are also called bitmap headers. >> + >> +Structure of a bitmap directory entry: >> + >> + Byte 0 - 7: bitmap_table_offset >> + Offset into the image file at which the bitmap table >> + (described below) for the bitmap starts. Must be aligned to >> + a cluster boundary. >> + >> + 8 - 11: bitmap_table_size >> + Number of entries in the bitmap table of the bitmap. >> + >> + 12 - 15: flags >> + Bit >> + 0: in_use >> + The bitmap was not saved correctly and may be >> + inconsistent. >> + >> + 1: auto >> + The bitmap must reflect all changes of the virtual >> + disk by any application that would write to this qcow2 >> + file (including writes, snapshot switching, etc.). The >> + type of this bitmap must be 'dirty tracking bitmap'. >> + >> + 2: extra_data_compatible >> + This flags is meaningful when the extra data is >> + unknown to the software (currently any extra data is >> + unknown to Qemu). >> + If it is set, the bitmap may be used as expected, extra >> + data must be left as is. >> + If it is not set, the bitmap must not be used, but >> + both it and its extra data be left as is. >> + >> + Bits 3 - 31 are reserved and must be 0. >> + >> + 16: type >> + This field describes the sort of the bitmap. >> + Values: >> + 1: Dirty tracking bitmap >> + >> + Values 0, 2 - 255 are reserved. >> + >> + 17: granularity_bits >> + Granularity bits. Valid values: 0 - 63. >> + >> + Note: Qemu currently doesn't support granularity_bits >> + greater than 31. >> + >> + Granularity is calculated as >> + granularity = 1 << granularity_bits >> + >> + A bitmap's granularity is how many bytes of the image >> + accounts for one bit of the bitmap. >> + >> + 18 - 19: name_size >> + Size of the bitmap name. Must be non-zero. >> + >> + Note: Qemu currently doesn't support values greater than >> + 1023. >> + >> + 20 - 23: extra_data_size >> + Size of type-specific extra data. >> + >> + For now, as no extra data is defined, extra_data_size is >> + reserved and should be zero. If it is non-zero the >> + behavior is defined by extra_data_compatible flag. >> + >> + variable: extra_data >> + Extra data for the bitmap, occupying extra_data_size bytes. >> + Extra data must never contain references to clusters or in >> + some other way allocate additional clusters. >> + >> + variable: name >> + The name of the bitmap (not null terminated), occupying >> + name_size bytes. Must be unique among all bitmap names >> + within the bitmaps extension. >> + >> + variable: Padding to round up the bitmap directory entry size to the >> + next multiple of 8. All bytes of the padding must be zero. > Isn't it clearer to find the next entry, if you add an "entry_size" in the > beginning, before bitmap_table_offset in each record? Hmm, I'm not sure. It is bad idea to have both extra_data_size and entry_size, because it is superfluous. Also what about padding? If entry_size will include it (which is expected) then we will not know exact size of extra_data. Is it bad? Also current scheme is made like one for snapshots. > >> + >> + >> +=== Bitmap table === >> + >> +Bitmaps are stored using a one-level structure (as opposed to two-level >> +structure like for refcounts and guest clusters mapping) for the mapping of > s/structure/structures/ > >> +bitmap data to host clusters. This structure is called the bitmap table. >> + >> +Each bitmap table has a variable size (stored in the bitmap directory entry) >> +and may use multiple clusters, however, it must be contiguous in the image >> +file. >> + >> +Structure of a bitmap table entry: >> + >> + Bit 0: Reserved and must be zero if bits 9 - 55 are non-zero. >> + If bits 9 - 55 are zero: >> + 0: Cluster should be read as all zeros. >> + 1: Cluster should be read as all ones. > Once bits 9 - 55 are non-zero, this bit goes useless? That doesn't make much > sense to me. In which case bit 0 is set but 9-55 are zero? In case "1: Cluster should be read as all ones.". > >> + >> + 1 - 8: Reserved and must be zero. >> + >> + 9 - 55: Bits 9 - 55 of the host cluster offset. Must be aligned to >> + a cluster boundary. If the offset is 0, the cluster is >> + unallocated; in that case, bit 0 determines how this >> + cluster should be treated during reads. >> + >> + 56 - 63: Reserved and must be zero. >> + >> + >> +=== Bitmap data === >> + >> +As noted above, bitmap data is stored in separate clusters, described by the >> +bitmap table. Given an offset (in bytes) into the bitmap data, the offset into >> +the image file can be obtained as follows: >> + >> + image_offset = >> + bitmap_table[bitmap_data_offset / cluster_size] + >> + (bitmap_data_offset % cluster_size) > In this pseudo code, image_offset looks like an variable, but... > >> + >> +This offset is not defined if bits 9 - 55 of bitmap table entry are zero (see >> +above). >> + >> +Given an offset byte_nr into the virtual disk and the bitmap's granularity, the >> +bit offset into the bitmap can be calculated like this: >> + >> + bit_offset = >> + image_offset(byte_nr / granularity / 8) * 8 + >> + (byte_nr / granularity) % 8 > ... here it looks like a function. Could you make it consistent? ok, will do > >> + >> +If the size of the bitmap data is not a multiple of the cluster size then the >> +last cluster of the bitmap data contains some unused tail bits. These bits must >> +be zero. > What defines the size of the bitmap data? bitmap size === virtual disk size. > >> + >> + >> +=== Dirty tracking bitmaps === >> + >> +Bitmaps with 'type' field equal to one are dirty tracking bitmaps. >> + >> +When the virtual disk is in use dirty tracking bitmap may be 'enabled' or >> +'disabled'. >> While the bitmap is 'enabled', all writes to the virtual disk >> +should be reflected in the bitmap. A set bit in the bitmap means that the >> +corresponding range of the virtual disk (see above) was written to while the >> +bitmap was 'enabled'. An unset bit means that this range was not written to. >> + >> +The software should not sync the bitmap in the image file with its >> +representation in RAM after each write. Flag 'in_use' should be set while the >> +bitmap is not synced. > I think this is an implementation detail. IMO a software *can* keep the bitmap > synced, "should not" is an obsecure and unnecessary constraint. s/should not/doesn't have to/, ok? > >> + >> +In the image file the 'enabled' state is reflected by the 'auto' flag. If this >> +flag is set, the software must consider the bitmap as 'enabled' and start >> +tracking virtual disk changes to this bitmap from the first write to the >> +virtual disk. If this flag is not set then the bitmap is disabled. >> + >> +To maintain bitmap consistency, the only software which is allowed to change >> +the value of the 'auto' flag is the one which has created the bitmap. > How does one software know if the image is created by it or not? I understand that this is not very good point for spec.. I can drop it. The idea is that "change this flag, do some writes, change it back" may bring great damage to backup tool, which was created that bitmap. > > Fam -- Best regards, Vladimir