[Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
Cc: Anthony Liguori <aliguori@us.ibm.com>,
	Avi Kivity <avi@redhat.com>,
	qemu-devel@nongnu.org, Christoph Hellwig <hch@lst.de>
Subject: [Qemu-devel] Re: [PATCH v2 3/7] docs: Add QED image format specification
Date: Mon, 11 Oct 2010 15:58:07 +0200	[thread overview]
Message-ID: <4CB317EF.7070504@redhat.com> (raw)
In-Reply-To: <1286552914-27014-4-git-send-email-stefanha@linux.vnet.ibm.com>

Am 08.10.2010 17:48, schrieb Stefan Hajnoczi:
> Signed-off-by: Stefan Hajnoczi <stefanha@linux.vnet.ibm.com>
> ---
>  docs/specs/qed_spec.txt |   94 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 94 insertions(+), 0 deletions(-)
>  create mode 100644 docs/specs/qed_spec.txt
> 
> diff --git a/docs/specs/qed_spec.txt b/docs/specs/qed_spec.txt
> new file mode 100644
> index 0000000..c942b8e
> --- /dev/null
> +++ b/docs/specs/qed_spec.txt
> @@ -0,0 +1,94 @@
> +=Specification=
> +
> +The file format looks like this:
> +
> + +----------+----------+----------+-----+
> + | cluster0 | cluster1 | cluster2 | ... |
> + +----------+----------+----------+-----+
> +
> +The first cluster begins with the '''header'''.  The header contains information about where regular clusters start; this allows the header to be extensible and store extra information about the image file.  A regular cluster may be a '''data cluster''', an '''L2''', or an '''L1 table'''.  L1 and L2 tables are composed of one or more contiguous clusters.
> +
> +Normally the file size will be a multiple of the cluster size.  If the file size is not a multiple, extra information after the last cluster may not be preserved if data is written.  Legitimate extra information should use space between the header and the first regular cluster.
> +
> +All fields are little-endian.
> +
> +==Header==
> + Header {
> +     uint32_t magic;               /* QED\0 */
> + 
> +     uint32_t cluster_size;        /* in bytes */
> +     uint32_t table_size;          /* for L1 and L2 tables, in clusters */
> +     uint32_t header_size;         /* in clusters */
> + 
> +     uint64_t features;            /* format feature bits */
> +     uint64_t compat_features;     /* compat feature bits */
> +     uint64_t l1_table_offset;     /* in bytes */
> +     uint64_t image_size;          /* total logical image size, in bytes */
> + 
> +     /* if (features & QED_F_BACKING_FILE) */
> +     uint32_t backing_filename_offset; /* in bytes from start of header */
> +     uint32_t backing_filename_size;   /* in bytes */
> + 
> +     /* if (compat_features & QED_CF_BACKING_FORMAT) */
> +     uint32_t backing_fmt_offset;  /* in bytes from start of header */
> +     uint32_t backing_fmt_size;    /* in bytes */

It was discussed before, but I don't think we came to a conclusion. Are
there any circumstances under which you don't want to set the
QED_CF_BACKING_FORMAT flag?

Also it's unclear what this "if" actually means: If the flag isn't set,
are the fields zero, are they undefined or are they even completely
missing and the offsets of the following fields must be adjusted?

> + }
> +
> +Field descriptions:
> +* cluster_size must be a power of 2 in range [2^12, 2^26].
> +* table_size must be a power of 2 in range [1, 16].

Is there a reason why this must be a power of two?

> +* header_size is the number of clusters used by the header and any additional information stored before regular clusters.
> +* features and compat_features are bitmaps where active file format features can be selectively enabled.  The difference between the two is that an image file that uses unknown compat_features bits can be safely opened without knowing how to interpret those bits.  If an image file has an unsupported features bit set then it is not possible to open that image (the image is not backwards-compatible).
> +* l1_table_offset must be a multiple of cluster_size.

And it is the offset of the first byte of the L1 table in the image file.

> +* image_size is the block device size seen by the guest and must be a multiple of cluster_size.

So there are image sizes that can't be accurately represented in QED? I
think that's a bad idea. Even more so because I can't see how it greatly
simplifies implementation (you save the operation for rounding up on
open/create, that's it) - it looks like a completely arbitrary restriction.

> +* backing_filename and backing_fmt are both strings in (byte offset, byte size) form.  They are not NUL-terminated and do not have alignment constraints.

A description of the meaning of these strings is missing.

> +
> +Feature bits:
> +* QED_F_BACKING_FILE = 0x01.  The image uses a backing file.
> +* QED_F_NEED_CHECK = 0x02.  The image needs a consistency check before use.
> +* QED_CF_BACKING_FORMAT = 0x01.  The image has a specific backing file format stored.

I suggest adding a headline "Compatibility Feature Bits". Seeing 0x01
twice is confusing at first sight.

> +
> +==Tables==
> +
> +Tables provide the translation from logical offsets in the block device to cluster offsets in the file.
> +
> + #define TABLE_NOFFSETS (table_size * cluster_size / sizeof(uint64_t))
> +  
> + Table {
> +     uint64_t offsets[TABLE_NOFFSETS];
> + }
> +
> +The tables are organized as follows:
> +
> +                    +----------+
> +                    | L1 table |
> +                    +----------+
> +               ,------'  |  '------.
> +          +----------+   |    +----------+
> +          | L2 table |  ...   | L2 table |
> +          +----------+        +----------+
> +      ,------'  |  '------.
> + +----------+   |    +----------+
> + |   Data   |  ...   |   Data   |
> + +----------+        +----------+
> +
> +A table is made up of one or more contiguous clusters.  The table_size header field determines table size for an image file.  For example, cluster_size=64 KB and table_size=4 results in 256 KB tables.
> +
> +The logical image size must be less than or equal to the maximum possible size of clusters rooted by the L1 table:
> + header.image_size <= TABLE_NOFFSETS * TABLE_NOFFSETS * header.cluster_size
> +
> +Logical offsets are translated into cluster offsets as follows:
> +
> +  table_bits table_bits    cluster_bits
> +  <--------> <--------> <--------------->
> + +----------+----------+-----------------+
> + | L1 index | L2 index |     byte offset |
> + +----------+----------+-----------------+
> + 
> +       Structure of a logical offset
> +
> + def logical_to_cluster_offset(l1_index, l2_index, byte_offset):
> +   l2_offset = l1_table[l1_index]
> +   l2_table = load_table(l2_offset)
> +   cluster_offset = l2_table[l2_index]
> +   return cluster_offset + byte_offset

Should we reserve some bits in the table entries in case we need some
flags later? Also, I suppose all table entries must be cluster aligned?

What happened to the other sections that older versions of the spec
contained? For example, this version doesn't specify any more what the
semantics of unallocated clusters and backing files is.

Kevin

next prev parent reply	other threads:[~2010-10-11 14:29 UTC|newest]

Thread overview: 72+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2010-10-08 15:48 [Qemu-devel] [PATCH v2 0/7] qed: Add QEMU Enhanced Disk format Stefan Hajnoczi
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 1/7] qcow2: Make get_bits_from_size() common Stefan Hajnoczi
2010-10-08 18:01   ` [Qemu-devel] " Anthony Liguori
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 2/7] cutils: Add bytes_to_str() to format byte values Stefan Hajnoczi
2010-10-11 11:09   ` [Qemu-devel] " Kevin Wolf
2010-10-13  9:15   ` [Qemu-devel] " Markus Armbruster
2010-10-13  9:28     ` Kevin Wolf
2010-10-13 10:58       ` Stefan Hajnoczi
2010-10-13 10:25   ` [Qemu-devel] " Avi Kivity
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 3/7] docs: Add QED image format specification Stefan Hajnoczi
2010-10-10  9:20   ` [Qemu-devel] " Avi Kivity
2010-10-11 10:09     ` Stefan Hajnoczi
2010-10-11 13:04       ` Avi Kivity
2010-10-11 13:42         ` Stefan Hajnoczi
2010-10-11 13:44           ` Avi Kivity
2010-10-11 14:06             ` Stefan Hajnoczi
2010-10-11 14:12               ` Avi Kivity
2010-10-11 15:02             ` Anthony Liguori
2010-10-11 15:24               ` Avi Kivity
2010-10-11 15:41                 ` Anthony Liguori
2010-10-11 15:47                   ` Avi Kivity
2010-10-11 14:54         ` Anthony Liguori
2010-10-11 14:58           ` Avi Kivity
2010-10-11 15:49             ` Anthony Liguori
2010-10-11 16:02               ` Avi Kivity
2010-10-11 16:10                 ` Anthony Liguori
2010-10-12 10:25                   ` Avi Kivity
2010-10-11 13:58   ` Kevin Wolf [this message]
2010-10-11 15:30     ` Stefan Hajnoczi
2010-10-11 15:39       ` Avi Kivity
2010-10-11 15:46         ` Stefan Hajnoczi
2010-10-11 16:18           ` Anthony Liguori
2010-10-11 17:14             ` Anthony Liguori
2010-10-12  8:07               ` Kevin Wolf
2010-10-12 13:16                 ` Stefan Hajnoczi
2010-10-12 13:32                   ` Anthony Liguori
2010-10-11 15:50       ` Kevin Wolf
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 4/7] qed: Add QEMU Enhanced Disk image format Stefan Hajnoczi
2010-10-11 15:16   ` [Qemu-devel] " Kevin Wolf
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 5/7] qed: Table, L2 cache, and cluster functions Stefan Hajnoczi
2010-10-12 14:44   ` [Qemu-devel] " Kevin Wolf
2010-10-13 13:41     ` Stefan Hajnoczi
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 6/7] qed: Read/write support Stefan Hajnoczi
2010-10-10  9:10   ` [Qemu-devel] " Avi Kivity
2010-10-11 10:37     ` Stefan Hajnoczi
2010-10-11 13:10       ` Avi Kivity
2010-10-11 13:55         ` Stefan Hajnoczi
2010-10-11 14:57         ` Anthony Liguori
2010-10-12 15:08   ` Kevin Wolf
2010-10-12 15:22     ` Anthony Liguori
2010-10-12 15:39       ` Kevin Wolf
2010-10-12 15:59         ` Stefan Hajnoczi
2010-10-12 16:16           ` Anthony Liguori
2010-10-12 16:21             ` Avi Kivity
2010-10-13 12:13             ` Stefan Hajnoczi
2010-10-13 13:07               ` Kevin Wolf
2010-10-13 13:24                 ` Anthony Liguori
2010-10-13 13:50                   ` Avi Kivity
2010-10-13 14:07                     ` Stefan Hajnoczi
2010-10-13 14:08                       ` Anthony Liguori
2010-10-13 14:10                       ` Avi Kivity
2010-10-13 14:11                         ` Anthony Liguori
2010-10-13 14:16                           ` Avi Kivity
2010-10-13 14:53                             ` Anthony Liguori
2010-10-13 15:08                               ` Avi Kivity
2010-10-13 15:42                                 ` Anthony Liguori
2010-10-14 11:06                         ` Stefan Hajnoczi
2010-10-13 14:10                     ` Anthony Liguori
2010-10-08 15:48 ` [Qemu-devel] [PATCH v2 7/7] qed: Consistency check support Stefan Hajnoczi
2010-10-11 13:21 ` [Qemu-devel] Re: [PATCH v2 0/7] qed: Add QEMU Enhanced Disk format Kevin Wolf
2010-10-11 15:37   ` Stefan Hajnoczi
2010-10-16  7:51 ` [Qemu-devel] " Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4CB317EF.7070504@redhat.com \
    --to=kwolf@redhat.com \
    --cc=aliguori@us.ibm.com \
    --cc=avi@redhat.com \
    --cc=hch@lst.de \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).