From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: famz@redhat.com, qemu-devel@nongnu.org, mreitz@redhat.com,
stefanha@redhat.com, den@openvz.org, John Snow <jsnow@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v7] spec: add qcow2 bitmaps extension specification
Date: Thu, 21 Jan 2016 13:44:02 +0300 [thread overview]
Message-ID: <56A0B672.2050907@virtuozzo.com> (raw)
In-Reply-To: <20160121095335.GB4490@noname.redhat.com>
On 21.01.2016 12:53, Kevin Wolf wrote:
> Am 21.01.2016 um 09:22 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> On 21.01.2016 00:22, John Snow wrote:
>>> On 01/20/2016 07:34 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> On 19.01.2016 20:48, Kevin Wolf wrote:
>>>>> Am 11.01.2016 um 14:05 hat Vladimir Sementsov-Ogievskiy geschrieben:
>>>>>> The new feature for qcow2: storing bitmaps.
>>>>>>
>>>>>> This patch adds new header extension to qcow2 - Bitmaps Extension. It
>>>>>> provides an ability to store virtual disk related bitmaps in a qcow2
>>>>>> image. For now there is only one type of such bitmaps: Dirty Tracking
>>>>>> Bitmap, which just tracks virtual disk changes from some moment.
>>>>>>
>>>>>> Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file,
>>>>>> should be stored in this qcow2 file. The size of each bitmap
>>>>>> (considering its granularity) is equal to virtual disk size.
>>>>>>
>>>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>>> ---
>>>>>>
>>>>>> v7:
>>>>>>
>>>>>> - Rewordings, grammar.
>>>>>> Max, Eric, John, thank you very much.
>>>>>>
>>>>>> - add last paragraph: remaining bits in bitmap data clusters must be
>>>>>> zero.
>>>>>>
>>>>>> - s/Bitmap Directory/bitmap directory/ and other names like this at
>>>>>> the request of Max.
>>>>>>
>>>>>> v6:
>>>>>>
>>>>>> - reword bitmap_directory_size description
>>>>>> - bitmap type: make 0 reserved
>>>>>> - extra_data_size: resize to 4bytes
>>>>>> Also, I've marked this field as "must be zero". We can always change
>>>>>> it, if we decide allowing managing app to specify any extra data, by
>>>>>> defining some magic value as a top of user extra data.. So, for now
>>>>>> non zeor extra_data_size should be considered as an error.
>>>>>> - swap name and extra_data to give good alignment to extra_data.
>>>>>>
>>>>>>
>>>>>> v5:
>>>>>>
>>>>>> - 'Dirty bitmaps' renamed to 'Bitmaps', as we may have several types of
>>>>>> bitmaps.
>>>>>> - rewordings
>>>>>> - move upper bounds to "Notes about Qemu limits"
>>>>>> - s/should/must somewhere. (but not everywhere)
>>>>>> - move name_size field closer to name itself in bitmap header
>>>>>> - add extra data area to bitmap header
>>>>>> - move bitmap data description to separate section
>>>>>>
>>>>>> docs/specs/qcow2.txt | 172
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++-
>>>>>> 1 file changed, 171 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
>>>>>> index 121dfc8..997239d 100644
>>>>>> --- a/docs/specs/qcow2.txt
>>>>>> +++ b/docs/specs/qcow2.txt
>>>>>> @@ -103,7 +103,18 @@ in the description of a field.
>>>>>> write to an image with unknown auto-clear
>>>>>> features if it
>>>>>> clears the respective bits from this field first.
>>>>>> - Bits 0-63: Reserved (set to 0)
>>>>>> + Bit 0: Bitmaps extension bit
>>>>>> + This bit indicates consistency for
>>>>>> the bitmaps
>>>>>> + extension data.
>>>>>> +
>>>>>> + It is an error if this bit is set
>>>>>> without the
>>>>>> + bitmaps extension present.
>>>>>> +
>>>>>> + If the bitmaps extension is present
>>>>>> but this
>>>>>> + bit is unset, the bitmaps extension
>>>>>> data is
>>>>>> + inconsistent.
>>>>> It may as well be consistent, but we don't know.
>>>>>
>>>>> Perhaps something like "must be considered inconsistent" or "is
>>>>> potentially inconsistent".
>>>>>
>>>>>> +
>>>>>> + Bits 1-63: Reserved (set to 0)
>>>>>> 96 - 99: refcount_order
>>>>>> Describes the width of a reference count block
>>>>>> entry (width
>>>>>> @@ -123,6 +134,7 @@ be stored. Each extension has a structure like
>>>>>> the following:
>>>>>> 0x00000000 - End of the header extension area
>>>>>> 0xE2792ACA - Backing file format name
>>>>>> 0x6803f857 - Feature name table
>>>>>> + 0x23852875 - Bitmaps extension
>>>>>> other - Unknown header extension, can
>>>>>> be safely
>>>>>> ignored
>>>>>> @@ -166,6 +178,34 @@ the header extension data. Each entry look
>>>>>> like this:
>>>>>> terminated if it has full length)
>>>>>> +== Bitmaps extension ==
>>>>>> +
>>>>>> +The bitmaps extension is an optional header extension. It provides
>>>>>> the ability
>>>>>> +to store bitmaps related to a virtual disk. For now, there is only
>>>>>> one bitmap
>>>>>> +type: the dirty tracking bitmap, which tracks virtual disk changes
>>>>> >from some
>>>>>> +point in time.
>>>>> I have one major problem with this patch, and it starts here.
>>>>>
>>>>> The spec talks about dirty tracking bitmaps all the way, but it never
>>>>> really defines what a dirty tracking bitmap even contains. It has a few
>>>>> hints here and there, but they aren't consistent.
>>>>>
>>>>> Here's the first hint: They track "virtual disk changes", which implies
>>>>> they track guest clusters rather than host clusters.
>>>>>
>>>>>> +The data of the extension should be considered consistent only if the
>>>>>> +corresponding auto-clear feature bit is set, see autoclear_features
>>>>>> above.
>>>>>> +
>>>>>> +The fields of the bitmaps extension are:
>>>>>> +
>>>>>> + 0 - 3: nb_bitmaps
>>>>>> + The number of bitmaps contained in the image.
>>>>>> Must be
>>>>>> + greater than or equal to 1.
>>>>>> +
>>>>>> + Note: Qemu currently only supports up to 65535
>>>>>> bitmaps per
>>>>>> + image.
>>>>>> +
>>>>>> + 4 - 7: bitmap_directory_size
>>>>>> + Size of the bitmap directory in bytes. It is the
>>>>>> cumulative
>>>>>> + size of all (nb_bitmaps) bitmap headers.
>>>>>> +
>>>>>> + 8 - 15: bitmap_directory_offset
>>>>>> + Offset into the image file at which the bitmap
>>>>>> directory
>>>>>> + starts. Must be aligned to a cluster boundary.
>>>>>> +
>>>>>> +
>>>>>> == Host cluster management ==
>>>>>> qcow2 manages the allocation of host clusters by maintaining a
>>>>>> reference count
>>>>>> @@ -360,3 +400,133 @@ Snapshot table entry:
>>>>>> variable: Padding to round up the snapshot table entry
>>>>>> size to the
>>>>>> next multiple of 8.
>>>>>> +
>>>>>> +
>>>>>> +== Bitmaps ==
>>>>>> +
>>>>>> +As mentioned above, the bitmaps extension provides the ability to
>>>>>> store bitmaps
>>>>>> +related a virtual disk. This section describes how these bitmaps are
>>>>>> stored.
>>>>> s/related/related to/
>>>>>
>>>>>> +Note: all bitmaps are related to the virtual disk stored in this image.
>>>>>> +
>>>>>> +=== Bitmap directory ===
>>>>>> +
>>>>>> +Each bitmap saved in the image is described in a bitmap directory
>>>>>> entry. The
>>>>>> +bitmap directory is a contiguous area in the image file, whose
>>>>>> starting offset
>>>>>> +and length are given by the header extension fields
>>>>>> bitmap_directory_offset and
>>>>>> +bitmap_directory_size. The entries of the bitmap directory have
>>>>>> variable
>>>>>> +length, depending on the length of the bitmap name and extra data.
>>>>>> These
>>>>>> +entries are also called bitmap headers.
>>>>>> +
>>>>>> +Structure of a bitmap directory entry:
>>>>>> +
>>>>>> + Byte 0 - 7: bitmap_table_offset
>>>>>> + Offset into the image file at which the bitmap
>>>>>> table
>>>>>> + (described below) for the bitmap starts. Must be
>>>>>> aligned to
>>>>>> + a cluster boundary.
>>>>>> +
>>>>>> + 8 - 11: bitmap_table_size
>>>>>> + Number of entries in the bitmap table of the
>>>>>> bitmap.
>>>>>> +
>>>>>> + 12 - 15: flags
>>>>>> + Bit
>>>>>> + 0: in_use
>>>>>> + The bitmap was not saved correctly and may be
>>>>>> + inconsistent.
>>>>>> +
>>>>>> + 1: auto
>>>>>> + The bitmap must reflect all changes of the
>>>>>> virtual
>>>>>> + disk by any application that would write to
>>>>>> this qcow2
>>>>>> + file (including writes, snapshot switching,
>>>>>> etc.). The
>>>>>> + type of this bitmap must be 'dirty tracking
>>>>>> bitmap'.
>>>>> This suggests that we can represent snapshot switching in a dirty
>>>>> tracking bitmap. Which is almost impossible if we track guest clusters,
>>>>> except if loading a snapshot should mean that all bits in the bitmap are
>>>>> set. However, that feels a bit useless and dirty tracking across
>>>>> snapshots doesn't seem to make a whole lot of sense anyway.
>>>>>
>>>>> Maybe what would make more sense is that a bitmap is tied to a specific
>>>>> snapshot or bitmaps are included in a snapshot, so that you can revert
>>>>> to the dirty status as it was when the snapshot was taken.
>>>>>
>>>>> Or, if you really meant, that the tracking words on a host cluster
>>>>> level, what clusters would be included in the bitmap? Would only the
>>>>> virtual disk be included? VM state? Any metadata?
>>>>>
>>>>> It seems we definitely need a new section on dirty tracking bitmaps that
>>>>> describes what the bitmap actually means and how it's supposed to work
>>>>> with snapshots. I guess we could also talk about how it works with other
>>>>> changes to the image like resizing.
>>>> Bitmaps track guest clusters.
>>>> Backup is not related to snapshots, so, I think set all bits in the
>>>> dirty bitmap on snapshot switch is ok. Of course we can't just ignore
>>>> snapshot switch, as it changes how user (and backup) sees the virtual
>>>> disk. Deleting the bitmap on snapshot switch is not good option too I
>>>> think.
>>>>
>>>> Bitmap size is defined to be equal to virtual disk size. So on resize,
>>>> the one who resize must resize the bitmap too, to maintain accordance of
>>>> the image file and the spec.
>>>>
>>>> I'll think about such section, ok.
>>>>
>>>>>> + Bits 2 - 31 are reserved and must be 0.
>>>>>> +
>>>>>> + 16: type
>>>>>> + This field describes the sort of the bitmap.
>>>>>> + Values:
>>>>>> + 1: Dirty tracking bitmap
>>>>>> +
>>>>>> + Values 0, 2 - 255 are reserved.
>>>>>> +
>>>>>> + 17: granularity_bits
>>>>>> + Granularity bits. Valid values: 0 - 63.
>>>>>> +
>>>>>> + Note: Qemu currently doesn't support
>>>>>> granularity_bits
>>>>>> + greater than 31.
>>>>>> +
>>>>>> + Granularity is calculated as
>>>>>> + granularity = 1 << granularity_bits
>>>>>> +
>>>>>> + A bitmap's granularity is how many bytes of the
>>>>>> image
>>>>>> + accounts for one bit of the bitmap.
>>>>>> +
>>>>>> + 18 - 19: name_size
>>>>>> + Size of the bitmap name. Must be non-zero.
>>>>>> +
>>>>>> + Note: Qemu currently doesn't support values
>>>>>> greater than
>>>>>> + 1023.
>>>>>> +
>>>>>> + 20 - 23: extra_data_size
>>>>>> + Size of type-specific extra data.
>>>>>> +
>>>>>> + For now, as no extra data is defined,
>>>>>> extra_data_size is
>>>>>> + reserved and must be zero.
>>>>>> +
>>>>>> + variable: Type-specific extra data for the bitmap.
>>>>> We talked about this in the other subthread.
>>>>>
>>>>>> + variable: The name of the bitmap (not null terminated).
>>>>>> Must be
>>>>>> + unique among all bitmap names within the bitmaps
>>>>>> extension.
>>>>>> +
>>>>>> + variable: Padding to round up the bitmap directory entry
>>>>>> size to the
>>>>>> + next multiple of 8.
>>>>>> +
>>>>>> +=== Bitmap table ===
>>>>>> +
>>>>>> +Bitmaps are stored using a one-level structure (as opposed to two-level
>>>>>> +structure like for refcounts and guest clusters mapping) for the
>>>>>> mapping of
>>>>>> +bitmap data to host clusters. This structure is called the bitmap
>>>>>> table.
>>>>>> +
>>>>>> +Each bitmap table has a variable size (stored in the bitmap
>>>>>> directory Entry)
>>>>>> +and may use multiple clusters, however, it must be contiguous in the
>>>>>> image
>>>>>> +file.
>>>>>> +
>>>>>> +Structure of a bitmap table entry:
>>>>>> +
>>>>>> + Bit 0: Reserved and must be zero if bits 9 - 55 are
>>>>>> non-zero.
>>>>>> + If bits 9 - 55 are zero:
>>>>>> + 0: Cluster should be read as all zeros.
>>>>>> + 1: Cluster should be read as all ones.
>>>>>> +
>>>>>> + 1 - 8: Reserved and must be zero.
>>>>>> +
>>>>>> + 9 - 55: Bits 9 - 55 of the host cluster offset. Must be
>>>>>> aligned to
>>>>>> + a cluster boundary. If the offset is 0, the
>>>>>> cluster is
>>>>>> + unallocated; in that case, bit 0 determines how
>>>>>> this
>>>>>> + cluster should be treated when read from.
>>>>>> +
>>>>>> + 56 - 63: Reserved and must be zero.
>>>>>> +
>>>>>> +=== Bitmap data ===
>>>>>> +
>>>>>> +As noted above, bitmap data is stored in separate clusters,
>>>>>> described by the
>>>>>> +bitmap table. Given an offset (in bytes) into the bitmap data, the
>>>>>> offset into
>>>>>> +the image file can be obtained as follows:
>>>>>> +
>>>>>> + image_offset =
>>>>>> + bitmap_table[bitmap_data_offset / cluster_size] +
>>>>>> + (bitmap_data_offset % cluster_size)
>>>>>> +
>>>>>> +This offset is not defined if bits 9 - 55 of bitmap table entry are
>>>>>> zero (see
>>>>>> +above).
>>>>>> +
>>>>>> +Given an offset byte_nr into the virtual disk and the bitmap's
>>>>>> granularity, the
>>>>>> +bit offset into the bitmap can be calculated like this:
>>>>>> +
>>>>>> + bit_offset =
>>>>>> + image_offset(byte_nr / granularity / 8) * 8 +
>>>>>> + (byte_nr / granularity) % 8
>>>>>> +
>>>>>> +If the size of the bitmap data is not a multiply of cluster size
>>>>>> then the last
>>>>>> +cluster of the bitmap data contains some unused tail bits. These
>>>>>> bits must be
>>>>>> +zero.
>>>>> In which order are the bits stored in the bitmap?
>>>> What do you mean?
>>> He means BE or LE. You can intuit it from the formula, but it's not
>>> explicitly stated, I think
>> byte ordering? But it make sense when we deal with integers, which
>> occupies more then 1 byte. Here is plain byte sequence. Bitmap. Like
>> a string.. What should I write here additionally?
> I meant sub-byte ordering. Whether bit 0 is the LSB or the MSB. I assume
> you intend to use the LSB for bit 0, but it's not written anywhere.
>
> Actually, that's an assumption that is made for bitmaps in the whole
> spec (including e.g. feature flags), even though it is only explicitly
> spelt out for the refcount blocks. I guess we should move it to the
> 'General' section.
Understand now. Then it should be separate patch.
>
> Kevin
--
Best regards,
Vladimir
* now, @virtuozzo.com instead of @parallels.com. Sorry for this inconvenience.
prev parent reply other threads:[~2016-01-21 10:44 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-01-11 13:05 [Qemu-devel] [PATCH v7] spec: add qcow2 bitmaps extension specification Vladimir Sementsov-Ogievskiy
2016-01-12 0:30 ` John Snow
2016-01-14 11:35 ` Denis V. Lunev
2016-01-14 16:42 ` John Snow
2016-01-14 22:08 ` Eric Blake
2016-01-14 23:26 ` John Snow
2016-01-16 14:06 ` Vladimir Sementsov-Ogievskiy
2016-01-18 16:54 ` John Snow
2016-01-18 21:16 ` Eric Blake
2016-01-19 8:57 ` Vladimir Sementsov-Ogievskiy
2016-01-19 17:29 ` Kevin Wolf
2016-01-25 10:15 ` Vladimir Sementsov-Ogievskiy
2016-01-25 11:09 ` Kevin Wolf
2016-01-25 12:27 ` Vladimir Sementsov-Ogievskiy
2016-01-19 17:27 ` Kevin Wolf
2016-01-25 10:22 ` Vladimir Sementsov-Ogievskiy
2016-01-19 17:48 ` Kevin Wolf
2016-01-20 12:34 ` Vladimir Sementsov-Ogievskiy
2016-01-20 21:22 ` John Snow
2016-01-21 8:22 ` Vladimir Sementsov-Ogievskiy
2016-01-21 9:53 ` Kevin Wolf
2016-01-21 10:44 ` Vladimir Sementsov-Ogievskiy [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=56A0B672.2050907@virtuozzo.com \
--to=vsementsov@virtuozzo.com \
--cc=den@openvz.org \
--cc=famz@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).