qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH V12 0/6] add-cow file format
@ 2012-08-10 15:39 Dong Xu Wang
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
                   ` (6 more replies)
  0 siblings, 7 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

This will introduce a new file format: add-cow. 

add-cow can benefit from other available functions, such as path_has_protocol and
qed_read_string, so we will make them public. 

Now add-cow is still using QEMUOptionParameter, not QemuOpts,  I will send a
separate patch series to convert.

snapshot_blkdev are not supported now for add-cow, after converting QEMUOptionParameter
to QemuOpts, will add related code.


v11->v12:
1) Removed un-used feature bit.
2) Share cache code with qcow2.c.
3) Remove snapshot_blkdev support, will add it in another patch.
5) COW Bitmap field in add-cow file will be multiple of 65536.
6) fix grammer and typo.

Dong Xu Wang (6):
  docs: document for add cow file format
  make path_has_protocol non-static
  qed_read_string to bdrv_read_string
  rename qcow2-cache.c to block-cache.c
  add-cow file format
  qemu-iotests

 block.c                      |   29 ++-
 block.h                      |    6 +
 block/Makefile.objs          |    4 +-
 block/add-cow.c              |  613 ++++++++++++++++++++++++++++++++++++++++++
 block/add-cow.h              |   85 ++++++
 block/qcow2-cache.c          |  323 ----------------------
 block/qcow2-cluster.c        |   66 +++--
 block/qcow2-refcount.c       |   66 +++--
 block/qcow2.c                |   36 ++--
 block/qcow2.h                |   24 +--
 block/qed.c                  |   29 +--
 block_int.h                  |    2 +
 docs/specs/add-cow.txt       |  123 +++++++++
 tests/qemu-iotests/017       |    2 +-
 tests/qemu-iotests/020       |    2 +-
 tests/qemu-iotests/check     |    4 +-
 tests/qemu-iotests/common    |    6 +
 tests/qemu-iotests/common.rc |   19 ++
 trace-events                 |   13 +-
 19 files changed, 994 insertions(+), 458 deletions(-)
 create mode 100644 block/add-cow.c
 create mode 100644 block/add-cow.h
 delete mode 100644 block/qcow2-cache.c
 create mode 100644 docs/specs/add-cow.txt

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-06 17:27   ` Michael Roth
  2012-09-10 15:23   ` Kevin Wolf
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

Document for add-cow format, the usage and spec of add-cow are introduced.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 docs/specs/add-cow.txt |  123 ++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 123 insertions(+), 0 deletions(-)
 create mode 100644 docs/specs/add-cow.txt

diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
new file mode 100644
index 0000000..d5a7a68
--- /dev/null
+++ b/docs/specs/add-cow.txt
@@ -0,0 +1,123 @@
+== General ==
+
+The raw file format does not support backing files or copy on write feature.
+The add-cow image format makes it possible to use backing files with raw
+image by keeping a separate .add-cow metadata file. Once all sectors
+have been written into the raw image it is safe to discard the .add-cow
+and backing files, then we can use the raw image directly.
+
+An example usage of add-cow would look like::
+(ubuntu.img is a disk image which has been installed OS.)
+    1)  Create a raw image with the same size of ubuntu.img
+            qemu-img create -f raw test.raw 8G
+    2)  Create an add-cow image which will store dirty bitmap
+            qemu-img create -f add-cow test.add-cow \
+                -o backing_file=ubuntu.img,image_file=test.raw
+    3)  Run qemu with add-cow image
+            qemu -drive if=virtio,file=test.add-cow
+
+test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
+will be calculated from the size of test.raw.
+
+=Specification=
+
+The file format looks like this:
+
+ +---------------+-------------+-----------------+
+ |     Header    |   Reserved  |    COW bitmap   |
+ +---------------+-------------+-----------------+
+
+All numbers in add-cow are stored in Little Endian byte order.
+
+== Header ==
+
+The Header is included in the first bytes:
+(#define HEADER_SIZE (4096 * header_pages_size))
+    Byte    0 -  7:     magic
+                        add-cow magic string ("ADD_COW\xff").
+
+            8 -  11:    version
+                        Version number (only valid value is 1 now).
+
+            12 - 15:    backing file name offset
+                        Offset in the add-cow file at which the backing file
+                        name is stored (NB: The string is not nul-terminated).
+                        If backing file name does NOT exist, this field will be
+                        0. Must be between 80 and [HEADER_SIZE - 2](a file name
+                        must be at least 1 byte).
+
+            16 - 19:    backing file name size
+                        Length of the backing file name in bytes. It will be 0
+                        if the backing file name offset is 0. If backing file
+                        name offset is non-zero, then it must be non-zero. Must
+                        be less than [HEADER_SIZE - 80] to fit in the reserved
+                        part of the header.
+
+            20 - 23:    image file name offset
+                        Offset in the add-cow file at which the image file name
+                        is stored (NB: The string is not null terminated). It
+                        must be between 80 and [HEADER_SIZE - 2].
+
+            24 - 27:    image file name size
+                        Length of the image file name in bytes.
+                        Must be less than [HEADER_SIZE - 80] to fit in the reserved
+                        part of the header.
+
+            28 - 35:    features
+                        Currently only 1 feature bit is used:
+                        Feature bits:
+                            * ADD_COW_F_All_ALLOCATED   = 0x01.
+
+            36 - 43:    optional features
+                        Not used now. Reserved for future use. It must be set to 0.
+
+            44 - 47:    header pages size
+                        The header field is variable-sized. This field indicates
+                        how many pages(4k) will be used to store add-cow header.
+                        In add-cow v1, it is fixed to 1, so the header size will
+                        be 4k * 1 = 4096 bytes.
+
+            48 - 63:    backing file format
+                        format of backing file. It will be filled with 0 if
+                        backing file name offset is 0. If backing file name
+                        offset is non-zero, it must be non-zero. It is coded
+                        in free-form ASCII, and is not NUL-terminated.
+
+            64 - 79:    image file format
+                        format of image file. It must be non-zero. It is coded
+                        in free-form ASCII, and is not NUL-terminated.
+
+            80 - [HEADER_SIZE - 1]:
+                        It is used to make sure COW bitmap field starts at the
+                        HEADER_SIZE byte, backing file name and image file name
+                        will be stored here. The bytes that is not pointing to
+                        backing file and image file names will bet set to 0.
+
+== COW bitmap ==
+
+The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
+backing file and image file. The bitmap will track whether the sector in
+backing file is dirty or not.
+
+Each bit in the bitmap indicates one cluster's status. One cluster includes 128
+sectors, then each bit indicates 512 * 128 = 64k bytes. the size of bitmap is
+calculated according to virtual size of image file, and it also should be multipe
+of 65536, the bits not used will be set to 0. Within each byte, the least
+significant bit covers the first cluster. Bit orders in one byte look like:
+ +----+----+----+----+----+----+----+----+
+ | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
+ +----+----+----+----+----+----+----+----+
+
+If the bit is 0, indicates the sector has not been allocated in image file, data
+should be loaded from backing file while reading; if the bit is 1, indicates the
+related sector has been dirty, should be loaded from image file while reading.
+Writing to a sector causes the corresponding bit to be set to 1.
+
+If raw image is not an even multiple of cluster bytes, bits that correspond to
+bytes beyond the raw file size in add-cow will be 0.
+
+Image file name and backing file name must NOT be the same, we prevent this
+while creating add-cow files.
+
+Image file and backing file are interpreted relative to the qcow2 file, not
+to the current working directory of the process that opened the qcow2 file.
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-06 17:27   ` Michael Roth
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

We will use path_has_protocol outside block.c, so just make it public.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 block.c |    2 +-
 block.h |    1 +
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/block.c b/block.c
index 24323c1..c13d803 100644
--- a/block.c
+++ b/block.c
@@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 }
 
 /* check if the path starts with "<protocol>:" */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
 {
     const char *p;
 
diff --git a/block.h b/block.h
index 650d872..54e61c9 100644
--- a/block.h
+++ b/block.h
@@ -307,6 +307,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn);
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size);
 int path_is_absolute(const char *path);
+int path_has_protocol(const char *path);
 void path_combine(char *dest, int dest_size,
                   const char *base_path,
                   const char *filename);
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-06 17:32   ` Michael Roth
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

Make qed_read_string function to a common interface, so move it to block.c.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 block.c     |   27 +++++++++++++++++++++++++++
 block.h     |    2 ++
 block/qed.c |   29 +----------------------------
 3 files changed, 30 insertions(+), 28 deletions(-)

diff --git a/block.c b/block.c
index c13d803..d906b35 100644
--- a/block.c
+++ b/block.c
@@ -213,6 +213,33 @@ int path_has_protocol(const char *path)
     return *p == ':';
 }
 
+/**
+ * Read a string of known length from the image file
+ *
+ * @bs:         Image file
+ * @offset:     File offset to start of string, in bytes
+ * @n:          String length in bytes
+ * @buf:        Destination buffer
+ * @buflen:     Destination buffer length in bytes
+ * @ret:        0 on success, -errno on failure
+ *
+ * The string is NUL-terminated.
+ */
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+                           char *buf, size_t buflen)
+{
+    int ret;
+    if (n >= buflen) {
+        return -EINVAL;
+    }
+    ret = bdrv_pread(bs, offset, buf, n);
+    if (ret < 0) {
+        return ret;
+    }
+    buf[n] = '\0';
+    return 0;
+}
+
 int path_is_absolute(const char *path)
 {
 #ifdef _WIN32
diff --git a/block.h b/block.h
index 54e61c9..e5dfcd7 100644
--- a/block.h
+++ b/block.h
@@ -154,6 +154,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
     const void *buf, int count);
 int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
     int nb_sectors, QEMUIOVector *qiov);
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+    char *buf, size_t buflen);
 int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
     int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
 int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
diff --git a/block/qed.c b/block/qed.c
index 5f3eefa..311c589 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
 }
 
 /**
- * Read a string of known length from the image file
- *
- * @file:       Image file
- * @offset:     File offset to start of string, in bytes
- * @n:          String length in bytes
- * @buf:        Destination buffer
- * @buflen:     Destination buffer length in bytes
- * @ret:        0 on success, -errno on failure
- *
- * The string is NUL-terminated.
- */
-static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
-                           char *buf, size_t buflen)
-{
-    int ret;
-    if (n >= buflen) {
-        return -EINVAL;
-    }
-    ret = bdrv_pread(file, offset, buf, n);
-    if (ret < 0) {
-        return ret;
-    }
-    buf[n] = '\0';
-    return 0;
-}
-
-/**
  * Allocate new clusters
  *
  * @s:          QED state
@@ -437,7 +410,7 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
             return -EINVAL;
         }
 
-        ret = qed_read_string(bs->file, s->header.backing_filename_offset,
+        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
                               s->header.backing_filename_size, bs->backing_file,
                               sizeof(bs->backing_file));
         if (ret < 0) {
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
                   ` (2 preceding siblings ...)
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-06 17:52   ` Michael Roth
  2012-09-11  8:41   ` Kevin Wolf
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

add-cow and qcow2 file format will share the same cache code, so rename
block-cache.c to block-cache.c. And related structure and qcow2 code also
are changed.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 block.h                |    3 +
 block/Makefile.objs    |    3 +-
 block/qcow2-cache.c    |  323 ------------------------------------------------
 block/qcow2-cluster.c  |   66 ++++++----
 block/qcow2-refcount.c |   66 ++++++-----
 block/qcow2.c          |   36 +++---
 block/qcow2.h          |   24 +---
 trace-events           |   13 +-
 8 files changed, 109 insertions(+), 425 deletions(-)
 delete mode 100644 block/qcow2-cache.c

diff --git a/block.h b/block.h
index e5dfcd7..c325661 100644
--- a/block.h
+++ b/block.h
@@ -401,6 +401,9 @@ typedef enum {
     BLKDBG_CLUSTER_ALLOC_BYTES,
     BLKDBG_CLUSTER_FREE,
 
+    BLKDBG_ADD_COW_UPDATE,
+    BLKDBG_ADD_COW_LOAD,
+
     BLKDBG_EVENT_MAX,
 } BlkDebugEvent;
 
diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..23bdfc8 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,7 +1,8 @@
 block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
+block-obj-y += block-cache.o
 block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-obj-y += stream.o
 block-obj-$(CONFIG_WIN32) += raw-win32.o
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
deleted file mode 100644
index 2d4322a..0000000
--- a/block/qcow2-cache.c
+++ /dev/null
@@ -1,323 +0,0 @@
-/*
- * L2/refcount table cache for the QCOW2 format
- *
- * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-#include "block_int.h"
-#include "qemu-common.h"
-#include "qcow2.h"
-#include "trace.h"
-
-typedef struct Qcow2CachedTable {
-    void*   table;
-    int64_t offset;
-    bool    dirty;
-    int     cache_hits;
-    int     ref;
-} Qcow2CachedTable;
-
-struct Qcow2Cache {
-    Qcow2CachedTable*       entries;
-    struct Qcow2Cache*      depends;
-    int                     size;
-    bool                    depends_on_flush;
-};
-
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
-{
-    BDRVQcowState *s = bs->opaque;
-    Qcow2Cache *c;
-    int i;
-
-    c = g_malloc0(sizeof(*c));
-    c->size = num_tables;
-    c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
-
-    for (i = 0; i < c->size; i++) {
-        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
-    }
-
-    return c;
-}
-
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
-{
-    int i;
-
-    for (i = 0; i < c->size; i++) {
-        assert(c->entries[i].ref == 0);
-        qemu_vfree(c->entries[i].table);
-    }
-
-    g_free(c->entries);
-    g_free(c);
-
-    return 0;
-}
-
-static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
-{
-    int ret;
-
-    ret = qcow2_cache_flush(bs, c->depends);
-    if (ret < 0) {
-        return ret;
-    }
-
-    c->depends = NULL;
-    c->depends_on_flush = false;
-
-    return 0;
-}
-
-static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
-{
-    BDRVQcowState *s = bs->opaque;
-    int ret = 0;
-
-    if (!c->entries[i].dirty || !c->entries[i].offset) {
-        return 0;
-    }
-
-    trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
-                                  c == s->l2_table_cache, i);
-
-    if (c->depends) {
-        ret = qcow2_cache_flush_dependency(bs, c);
-    } else if (c->depends_on_flush) {
-        ret = bdrv_flush(bs->file);
-        if (ret >= 0) {
-            c->depends_on_flush = false;
-        }
-    }
-
-    if (ret < 0) {
-        return ret;
-    }
-
-    if (c == s->refcount_block_cache) {
-        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
-    } else if (c == s->l2_table_cache) {
-        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
-    }
-
-    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
-        s->cluster_size);
-    if (ret < 0) {
-        return ret;
-    }
-
-    c->entries[i].dirty = false;
-
-    return 0;
-}
-
-int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
-{
-    BDRVQcowState *s = bs->opaque;
-    int result = 0;
-    int ret;
-    int i;
-
-    trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
-
-    for (i = 0; i < c->size; i++) {
-        ret = qcow2_cache_entry_flush(bs, c, i);
-        if (ret < 0 && result != -ENOSPC) {
-            result = ret;
-        }
-    }
-
-    if (result == 0) {
-        ret = bdrv_flush(bs->file);
-        if (ret < 0) {
-            result = ret;
-        }
-    }
-
-    return result;
-}
-
-int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
-    Qcow2Cache *dependency)
-{
-    int ret;
-
-    if (dependency->depends) {
-        ret = qcow2_cache_flush_dependency(bs, dependency);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-
-    if (c->depends && (c->depends != dependency)) {
-        ret = qcow2_cache_flush_dependency(bs, c);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-
-    c->depends = dependency;
-    return 0;
-}
-
-void qcow2_cache_depends_on_flush(Qcow2Cache *c)
-{
-    c->depends_on_flush = true;
-}
-
-static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
-{
-    int i;
-    int min_count = INT_MAX;
-    int min_index = -1;
-
-
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].ref) {
-            continue;
-        }
-
-        if (c->entries[i].cache_hits < min_count) {
-            min_index = i;
-            min_count = c->entries[i].cache_hits;
-        }
-
-        /* Give newer hits priority */
-        /* TODO Check how to optimize the replacement strategy */
-        c->entries[i].cache_hits /= 2;
-    }
-
-    if (min_index == -1) {
-        /* This can't happen in current synchronous code, but leave the check
-         * here as a reminder for whoever starts using AIO with the cache */
-        abort();
-    }
-    return min_index;
-}
-
-static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
-    uint64_t offset, void **table, bool read_from_disk)
-{
-    BDRVQcowState *s = bs->opaque;
-    int i;
-    int ret;
-
-    trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
-                          offset, read_from_disk);
-
-    /* Check if the table is already cached */
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].offset == offset) {
-            goto found;
-        }
-    }
-
-    /* If not, write a table back and replace it */
-    i = qcow2_cache_find_entry_to_replace(c);
-    trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
-                                        c == s->l2_table_cache, i);
-    if (i < 0) {
-        return i;
-    }
-
-    ret = qcow2_cache_entry_flush(bs, c, i);
-    if (ret < 0) {
-        return ret;
-    }
-
-    trace_qcow2_cache_get_read(qemu_coroutine_self(),
-                               c == s->l2_table_cache, i);
-    c->entries[i].offset = 0;
-    if (read_from_disk) {
-        if (c == s->l2_table_cache) {
-            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
-        }
-
-        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
-        if (ret < 0) {
-            return ret;
-        }
-    }
-
-    /* Give the table some hits for the start so that it won't be replaced
-     * immediately. The number 32 is completely arbitrary. */
-    c->entries[i].cache_hits = 32;
-    c->entries[i].offset = offset;
-
-    /* And return the right table */
-found:
-    c->entries[i].cache_hits++;
-    c->entries[i].ref++;
-    *table = c->entries[i].table;
-
-    trace_qcow2_cache_get_done(qemu_coroutine_self(),
-                               c == s->l2_table_cache, i);
-
-    return 0;
-}
-
-int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table)
-{
-    return qcow2_cache_do_get(bs, c, offset, table, true);
-}
-
-int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table)
-{
-    return qcow2_cache_do_get(bs, c, offset, table, false);
-}
-
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
-{
-    int i;
-
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].table == *table) {
-            goto found;
-        }
-    }
-    return -ENOENT;
-
-found:
-    c->entries[i].ref--;
-    *table = NULL;
-
-    assert(c->entries[i].ref >= 0);
-    return 0;
-}
-
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
-{
-    int i;
-
-    for (i = 0; i < c->size; i++) {
-        if (c->entries[i].table == table) {
-            goto found;
-        }
-    }
-    abort();
-
-found:
-    c->entries[i].dirty = true;
-}
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e179211..335dc7a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -28,6 +28,7 @@
 #include "block_int.h"
 #include "block/qcow2.h"
 #include "trace.h"
+#include "block-cache.h"
 
 int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
 {
@@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
         return new_l1_table_offset;
     }
 
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    ret = block_cache_flush(bs, s->refcount_block_cache,
+        BLOCK_TABLE_REF, s->cluster_size);
     if (ret < 0) {
         goto fail;
     }
@@ -119,7 +121,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
     BDRVQcowState *s = bs->opaque;
     int ret;
 
-    ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
+    ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
+        (void **) l2_table, BLOCK_TABLE_L2, s->cluster_size);
 
     return ret;
 }
@@ -180,7 +183,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
         return l2_offset;
     }
 
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    ret = block_cache_flush(bs, s->refcount_block_cache,
+        BLOCK_TABLE_REF, s->cluster_size);
     if (ret < 0) {
         goto fail;
     }
@@ -188,7 +192,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
     /* allocate a new entry in the l2 cache */
 
     trace_qcow2_l2_allocate_get_empty(bs, l1_index);
-    ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
+    ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
+        (void **) table, BLOCK_TABLE_L2, s->cluster_size);
     if (ret < 0) {
         return ret;
     }
@@ -203,16 +208,17 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
 
         /* if there was an old l2 table, read it from the disk */
         BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
-        ret = qcow2_cache_get(bs, s->l2_table_cache,
+        ret = block_cache_get(bs, s->l2_table_cache,
             old_l2_offset & L1E_OFFSET_MASK,
-            (void**) &old_table);
+            (void **) &old_table, BLOCK_TABLE_L2, s->cluster_size);
         if (ret < 0) {
             goto fail;
         }
 
         memcpy(l2_table, old_table, s->cluster_size);
 
-        ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
+        ret = block_cache_put(bs, s->l2_table_cache,
+            (void **) &old_table, BLOCK_TABLE_L2);
         if (ret < 0) {
             goto fail;
         }
@@ -222,8 +228,9 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
     BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
 
     trace_qcow2_l2_allocate_write_l2(bs, l1_index);
-    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
-    ret = qcow2_cache_flush(bs, s->l2_table_cache);
+    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+    ret = block_cache_flush(bs, s->l2_table_cache,
+        BLOCK_TABLE_L2, s->cluster_size);
     if (ret < 0) {
         goto fail;
     }
@@ -242,7 +249,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
 
 fail:
     trace_qcow2_l2_allocate_done(bs, l1_index, ret);
-    qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
+    block_cache_put(bs, s->l2_table_cache, (void **) table, BLOCK_TABLE_L2);
     s->l1_table[l1_index] = old_l2_offset;
     return ret;
 }
@@ -475,7 +482,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
         abort();
     }
 
-    qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    block_cache_put(bs, s->l2_table_cache, (void **) &l2_table, BLOCK_TABLE_L2);
 
     nb_available = (c * s->cluster_sectors);
 
@@ -584,13 +591,15 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
      * allocated. */
     cluster_offset = be64_to_cpu(l2_table[l2_index]);
     if (cluster_offset & L2E_OFFSET_MASK) {
-        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+        block_cache_put(bs, s->l2_table_cache,
+            (void **) &l2_table, BLOCK_TABLE_L2);
         return 0;
     }
 
     cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
     if (cluster_offset < 0) {
-        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+        block_cache_put(bs, s->l2_table_cache,
+            (void **) &l2_table, BLOCK_TABLE_L2);
         return 0;
     }
 
@@ -605,9 +614,10 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
     /* compressed clusters never have the copied flag */
 
     BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
-    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
     l2_table[l2_index] = cpu_to_be64(cluster_offset);
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache,
+        (void **) &l2_table, BLOCK_TABLE_L2);
     if (ret < 0) {
         return 0;
     }
@@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      * handled.
      */
     if (cow) {
-        qcow2_cache_depends_on_flush(s->l2_table_cache);
+        block_cache_depends_on_flush(s->l2_table_cache);
     }
 
-    if (qcow2_need_accurate_refcounts(s)) {
-        qcow2_cache_set_dependency(bs, s->l2_table_cache,
-                                   s->refcount_block_cache);
-    }
+    block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
+        s->refcount_block_cache, s->cluster_size);
     ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
     if (ret < 0) {
         goto err;
     }
-    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
 
     for (i = 0; i < m->nb_clusters; i++) {
         /* if two concurrent writes happen to the same unallocated cluster
@@ -687,7 +695,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
      }
 
 
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache,
+        (void **) &l2_table, BLOCK_TABLE_L2);
     if (ret < 0) {
         goto err;
     }
@@ -913,7 +922,8 @@ again:
      * request to complete. If we still had the reference, we could use up the
      * whole cache with sleeping requests.
      */
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache,
+        (void **) &l2_table, BLOCK_TABLE_L2);
     if (ret < 0) {
         return ret;
     }
@@ -1077,14 +1087,15 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
         }
 
         /* First remove L2 entries */
-        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
         l2_table[l2_index + i] = cpu_to_be64(0);
 
         /* Then decrease the refcount */
         qcow2_free_any_clusters(bs, old_offset, 1);
     }
 
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache,
+        (void **) &l2_table, BLOCK_TABLE_L2);
     if (ret < 0) {
         return ret;
     }
@@ -1154,7 +1165,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
         old_offset = be64_to_cpu(l2_table[l2_index + i]);
 
         /* Update L2 entries */
-        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
         if (old_offset & QCOW_OFLAG_COMPRESSED) {
             l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
             qcow2_free_any_clusters(bs, old_offset, 1);
@@ -1163,7 +1174,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
         }
     }
 
-    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+    ret = block_cache_put(bs, s->l2_table_cache,
+        (void **) &l2_table, BLOCK_TABLE_L2);
     if (ret < 0) {
         return ret;
     }
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 5e3f915..728bfc1 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -25,6 +25,7 @@
 #include "qemu-common.h"
 #include "block_int.h"
 #include "block/qcow2.h"
+#include "block-cache.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
 static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -71,8 +72,8 @@ static int load_refcount_block(BlockDriverState *bs,
     int ret;
 
     BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
-    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
-        refcount_block);
+    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
+        refcount_block, BLOCK_TABLE_REF, s->cluster_size);
 
     return ret;
 }
@@ -98,8 +99,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
     if (!refcount_block_offset)
         return 0;
 
-    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
-        (void**) &refcount_block);
+    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
+        (void **) &refcount_block, BLOCK_TABLE_REF, s->cluster_size);
     if (ret < 0) {
         return ret;
     }
@@ -108,8 +109,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
         ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
     refcount = be16_to_cpu(refcount_block[block_index]);
 
-    ret = qcow2_cache_put(bs, s->refcount_block_cache,
-        (void**) &refcount_block);
+    ret = block_cache_put(bs, s->refcount_block_cache,
+        (void **) &refcount_block, BLOCK_TABLE_REF);
     if (ret < 0) {
         return ret;
     }
@@ -201,7 +202,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
     *refcount_block = NULL;
 
     /* We write to the refcount table, so we might depend on L2 tables */
-    qcow2_cache_flush(bs, s->l2_table_cache);
+    block_cache_flush(bs, s->l2_table_cache,
+        BLOCK_TABLE_L2, s->cluster_size);
 
     /* Allocate the refcount block itself and mark it as used */
     int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
@@ -217,8 +219,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
 
     if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
         /* Zero the new refcount block before updating it */
-        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
-            (void**) refcount_block);
+        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
+            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
         if (ret < 0) {
             goto fail_block;
         }
@@ -241,8 +243,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
 
         /* Initialize the new refcount block only after updating its refcount,
          * update_refcount uses the refcount cache itself */
-        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
-            (void**) refcount_block);
+        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
+            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
         if (ret < 0) {
             goto fail_block;
         }
@@ -252,8 +254,9 @@ static int alloc_refcount_block(BlockDriverState *bs,
 
     /* Now the new refcount block needs to be written to disk */
     BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
-    qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
-    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+    block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
+    ret = block_cache_flush(bs, s->refcount_block_cache,
+        BLOCK_TABLE_REF, s->cluster_size);
     if (ret < 0) {
         goto fail_block;
     }
@@ -273,7 +276,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
         return 0;
     }
 
-    ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+    ret = block_cache_put(bs, s->refcount_block_cache,
+        (void **) refcount_block, BLOCK_TABLE_REF);
     if (ret < 0) {
         goto fail_block;
     }
@@ -406,7 +410,8 @@ fail_table:
     g_free(new_table);
 fail_block:
     if (*refcount_block != NULL) {
-        qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+        block_cache_put(bs, s->refcount_block_cache,
+            (void **) refcount_block, BLOCK_TABLE_REF);
     }
     return ret;
 }
@@ -432,8 +437,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
     }
 
     if (addend < 0) {
-        qcow2_cache_set_dependency(bs, s->refcount_block_cache,
-            s->l2_table_cache);
+        block_cache_set_dependency(bs, s->refcount_block_cache, BLOCK_TABLE_REF,
+            s->l2_table_cache, s->cluster_size);
     }
 
     start = offset & ~(s->cluster_size - 1);
@@ -449,8 +454,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         /* Load the refcount block and allocate it if needed */
         if (table_index != old_table_index) {
             if (refcount_block) {
-                ret = qcow2_cache_put(bs, s->refcount_block_cache,
-                    (void**) &refcount_block);
+                ret = block_cache_put(bs, s->refcount_block_cache,
+                    (void **) &refcount_block, BLOCK_TABLE_REF);
                 if (ret < 0) {
                     goto fail;
                 }
@@ -463,7 +468,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
         }
         old_table_index = table_index;
 
-        qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
+        block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
 
         /* we can update the count and save it */
         block_index = cluster_index &
@@ -486,8 +491,8 @@ fail:
     /* Write last changed block to disk */
     if (refcount_block) {
         int wret;
-        wret = qcow2_cache_put(bs, s->refcount_block_cache,
-            (void**) &refcount_block);
+        wret = block_cache_put(bs, s->refcount_block_cache,
+            (void **) &refcount_block, BLOCK_TABLE_REF);
         if (wret < 0) {
             return ret < 0 ? ret : wret;
         }
@@ -763,8 +768,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
             old_l2_offset = l2_offset;
             l2_offset &= L1E_OFFSET_MASK;
 
-            ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
-                (void**) &l2_table);
+            ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
+                (void **) &l2_table, BLOCK_TABLE_L2, s->cluster_size);
             if (ret < 0) {
                 goto fail;
             }
@@ -811,16 +816,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
                     }
                     if (offset != old_offset) {
                         if (addend > 0) {
-                            qcow2_cache_set_dependency(bs, s->l2_table_cache,
-                                s->refcount_block_cache);
+                            block_cache_set_dependency(bs, s->l2_table_cache,
+                                BLOCK_TABLE_L2, s->refcount_block_cache,
+                                s->cluster_size);
                         }
                         l2_table[j] = cpu_to_be64(offset);
-                        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+                        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
                     }
                 }
             }
 
-            ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+            ret = block_cache_put(bs, s->l2_table_cache,
+                (void **) &l2_table, BLOCK_TABLE_L2);
             if (ret < 0) {
                 goto fail;
             }
@@ -847,7 +854,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
     ret = 0;
 fail:
     if (l2_table) {
-        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+        block_cache_put(bs, s->l2_table_cache,
+            (void **) &l2_table, BLOCK_TABLE_L2);
     }
 
     /* Update L1 only if it isn't deleted anyway (addend = -1) */
diff --git a/block/qcow2.c b/block/qcow2.c
index fd5e214..b89d312 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -30,6 +30,7 @@
 #include "qemu-error.h"
 #include "qerror.h"
 #include "trace.h"
+#include "block-cache.h"
 
 /*
   Differences with QCOW:
@@ -415,8 +416,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     }
 
     /* alloc L2 table/refcount block cache */
-    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
-    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+    s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
+    s->refcount_block_cache =
+        block_cache_create(bs, REFCOUNT_CACHE_SIZE, s->cluster_size);
 
     s->cluster_cache = g_malloc(s->cluster_size);
     /* one more sector for decompressed data alignment */
@@ -500,7 +502,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
     qcow2_refcount_close(bs);
     g_free(s->l1_table);
     if (s->l2_table_cache) {
-        qcow2_cache_destroy(bs, s->l2_table_cache);
+        block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
     }
     g_free(s->cluster_cache);
     qemu_vfree(s->cluster_data);
@@ -860,13 +862,13 @@ static void qcow2_close(BlockDriverState *bs)
     BDRVQcowState *s = bs->opaque;
     g_free(s->l1_table);
 
-    qcow2_cache_flush(bs, s->l2_table_cache);
-    qcow2_cache_flush(bs, s->refcount_block_cache);
-
+    block_cache_flush(bs, s->l2_table_cache,
+        BLOCK_TABLE_L2, s->cluster_size);
+    block_cache_flush(bs, s->refcount_block_cache,
+        BLOCK_TABLE_REF, s->cluster_size);
     qcow2_mark_clean(bs);
-
-    qcow2_cache_destroy(bs, s->l2_table_cache);
-    qcow2_cache_destroy(bs, s->refcount_block_cache);
+    block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
+    block_cache_destroy(bs, s->refcount_block_cache, BLOCK_TABLE_REF);
 
     g_free(s->unknown_header_fields);
     cleanup_unknown_header_ext(bs);
@@ -1339,8 +1341,6 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
                     options->value.s);
                 return -EINVAL;
             }
-        } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
-            flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
         }
         options++;
     }
@@ -1537,18 +1537,18 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
     int ret;
 
     qemu_co_mutex_lock(&s->lock);
-    ret = qcow2_cache_flush(bs, s->l2_table_cache);
+    ret = block_cache_flush(bs, s->l2_table_cache,
+        BLOCK_TABLE_L2, s->cluster_size);
     if (ret < 0) {
         qemu_co_mutex_unlock(&s->lock);
         return ret;
     }
 
-    if (qcow2_need_accurate_refcounts(s)) {
-        ret = qcow2_cache_flush(bs, s->refcount_block_cache);
-        if (ret < 0) {
-            qemu_co_mutex_unlock(&s->lock);
-            return ret;
-        }
+    ret = block_cache_flush(bs, s->refcount_block_cache,
+        BLOCK_TABLE_REF, s->cluster_size);
+    if (ret < 0) {
+        qemu_co_mutex_unlock(&s->lock);
+        return ret;
     }
     qemu_co_mutex_unlock(&s->lock);
 
diff --git a/block/qcow2.h b/block/qcow2.h
index b4eb654..cb6fd7a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -27,6 +27,7 @@
 
 #include "aes.h"
 #include "qemu-coroutine.h"
+#include "block-cache.h"
 
 //#define DEBUG_ALLOC
 //#define DEBUG_ALLOC2
@@ -94,8 +95,6 @@ typedef struct QCowSnapshot {
     uint64_t vm_clock_nsec;
 } QCowSnapshot;
 
-struct Qcow2Cache;
-typedef struct Qcow2Cache Qcow2Cache;
 
 typedef struct Qcow2UnknownHeaderExtension {
     uint32_t magic;
@@ -146,8 +145,8 @@ typedef struct BDRVQcowState {
     uint64_t l1_table_offset;
     uint64_t *l1_table;
 
-    Qcow2Cache* l2_table_cache;
-    Qcow2Cache* refcount_block_cache;
+    BlockCache *l2_table_cache;
+    BlockCache *refcount_block_cache;
 
     uint8_t *cluster_cache;
     uint8_t *cluster_data;
@@ -316,21 +315,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
 
 void qcow2_free_snapshots(BlockDriverState *bs);
 int qcow2_read_snapshots(BlockDriverState *bs);
-
-/* qcow2-cache.c functions */
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
-
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
-int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
-int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
-    Qcow2Cache *dependency);
-void qcow2_cache_depends_on_flush(Qcow2Cache *c);
-
-int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table);
-int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
-    void **table);
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
-
 #endif
diff --git a/trace-events b/trace-events
index 6b12f83..52b6438 100644
--- a/trace-events
+++ b/trace-events
@@ -439,12 +439,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
 qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
 qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
 
-qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
-qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
-qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+# block/block-cache.c
+block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
+block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
+block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
 
 # block/qed-l2-cache.c
 qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
                   ` (3 preceding siblings ...)
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-06 20:19   ` Michael Roth
  2012-09-11  9:40   ` Kevin Wolf
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
  2012-08-23  5:34 ` [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
  6 siblings, 2 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

add-cow file format core code. It use block-cache.c as cache code.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 block/Makefile.objs |    1 +
 block/add-cow.c     |  613 +++++++++++++++++++++++++++++++++++++++++++++++++++
 block/add-cow.h     |   85 +++++++
 block_int.h         |    2 +
 4 files changed, 701 insertions(+), 0 deletions(-)
 create mode 100644 block/add-cow.c
 create mode 100644 block/add-cow.h

diff --git a/block/Makefile.objs b/block/Makefile.objs
index 23bdfc8..7ed5051 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
 block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
 block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-obj-y += qed-check.o
+block-obj-y += add-cow.o
 block-obj-y += block-cache.o
 block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-obj-y += stream.o
diff --git a/block/add-cow.c b/block/add-cow.c
new file mode 100644
index 0000000..d4711d5
--- /dev/null
+++ b/block/add-cow.c
@@ -0,0 +1,613 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "block_int.h"
+#include "module.h"
+#include "add-cow.h"
+
+static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
+{
+    cpu->magic                      = le64_to_cpu(le->magic);
+    cpu->version                    = le32_to_cpu(le->version);
+
+    cpu->backing_filename_offset    = le32_to_cpu(le->backing_filename_offset);
+    cpu->backing_filename_size      = le32_to_cpu(le->backing_filename_size);
+
+    cpu->image_filename_offset      = le32_to_cpu(le->image_filename_offset);
+    cpu->image_filename_size        = le32_to_cpu(le->image_filename_size);
+
+    cpu->features                   = le64_to_cpu(le->features);
+    cpu->optional_features          = le64_to_cpu(le->optional_features);
+    cpu->header_pages_size          = le32_to_cpu(le->header_pages_size);
+}
+
+static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
+{
+    le->magic                       = cpu_to_le64(cpu->magic);
+    le->version                     = cpu_to_le32(cpu->version);
+
+    le->backing_filename_offset     = cpu_to_le32(cpu->backing_filename_offset);
+    le->backing_filename_size       = cpu_to_le32(cpu->backing_filename_size);
+
+    le->image_filename_offset       = cpu_to_le32(cpu->image_filename_offset);
+    le->image_filename_size         = cpu_to_le32(cpu->image_filename_size);
+
+    le->features                    = cpu_to_le64(cpu->features);
+    le->optional_features           = cpu_to_le64(cpu->optional_features);
+    le->header_pages_size           = cpu_to_le32(cpu->header_pages_size);
+}
+
+static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+    const AddCowHeader *header = (const AddCowHeader *)buf;
+
+    if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
+        le32_to_cpu(header->version) == ADD_COW_VERSION) {
+        return 100;
+    } else {
+        return 0;
+    }
+}
+
+static int add_cow_create(const char *filename, QEMUOptionParameter *options)
+{
+    AddCowHeader header = {
+        .magic = ADD_COW_MAGIC,
+        .version = ADD_COW_VERSION,
+        .features = 0,
+        .optional_features = 0,
+        .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
+    };
+    AddCowHeader le_header;
+    int64_t image_len = 0;
+    const char *backing_filename = NULL;
+    const char *backing_fmt = NULL;
+    const char *image_filename = NULL;
+    const char *image_format = NULL;
+    BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
+    BlockDriver *drv = bdrv_find_format("add-cow");
+    BDRVAddCowState s;
+    int ret;
+
+    while (options && options->name) {
+        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
+            image_len = options->value.n;
+        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
+            backing_filename = options->value.s;
+        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
+            backing_fmt = options->value.s;
+        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
+            image_filename = options->value.s;
+        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
+            image_format = options->value.s;
+        }
+        options++;
+    }
+
+    if (backing_filename) {
+        header.backing_filename_offset = sizeof(header)
+            + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
+        header.backing_filename_size = strlen(backing_filename);
+
+        if (!backing_fmt) {
+            backing_bs = bdrv_new("image");
+            ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
+                    | BDRV_O_CACHE_WB, NULL);
+            if (ret < 0) {
+                return ret;
+            }
+            backing_fmt = bdrv_get_format_name(backing_bs);
+            bdrv_delete(backing_bs);
+        }
+    } else {
+        header.features |= ADD_COW_F_All_ALLOCATED;
+    }
+
+    if (image_filename) {
+        header.image_filename_offset =
+            sizeof(header) + sizeof(s.backing_file_format)
+                + sizeof(s.image_file_format) + header.backing_filename_size;
+        header.image_filename_size = strlen(image_filename);
+    } else {
+        error_report("Error: image_file should be given.");
+        return -EINVAL;
+    }
+
+    if (backing_filename && !strcmp(backing_filename, image_filename)) {
+        error_report("Error: Trying to create an image with the "
+                     "same backing file name as the image file name");
+        return -EINVAL;
+    }
+
+    if (!strcmp(filename, image_filename)) {
+        error_report("Error: Trying to create an image with the "
+                     "same filename as the image file name");
+        return -EINVAL;
+    }
+
+    if (header.image_filename_offset + header.image_filename_size
+            > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
+        error_report("image_file name or backing_file name too long.");
+        return -ENOSPC;
+    }
+
+    ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
+    if (ret < 0) {
+        return ret;
+    }
+    bdrv_delete(image_bs);
+
+    ret = bdrv_create_file(filename, NULL);
+    if (ret < 0) {
+        return ret;
+    }
+
+    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
+    if (ret < 0) {
+        return ret;
+    }
+    add_cow_header_cpu_to_le(&header, &le_header);
+    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
+        backing_fmt ? strlen(backing_fmt) : 0);
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
+        image_format ? image_format : "raw",
+        image_format ? strlen(image_format) : sizeof("raw"));
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    if (backing_filename) {
+        ret = bdrv_pwrite(bs, header.backing_filename_offset,
+            backing_filename, header.backing_filename_size);
+        if (ret < 0) {
+            bdrv_delete(bs);
+            return ret;
+        }
+    }
+
+    ret = bdrv_pwrite(bs, header.image_filename_offset,
+        image_filename, header.image_filename_size);
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
+    if (ret < 0) {
+        bdrv_delete(bs);
+        return ret;
+    }
+
+    ret = bdrv_truncate(bs, image_len);
+    bdrv_delete(bs);
+    return ret;
+}
+
+static int add_cow_open(BlockDriverState *bs, int flags)
+{
+    char                image_filename[ADD_COW_FILE_LEN];
+    char                tmp_name[ADD_COW_FILE_LEN];
+    BlockDriver         *image_drv = NULL;
+    int                 ret;
+    int                 sector_per_byte;
+    BDRVAddCowState     *s = bs->opaque;
+    AddCowHeader        le_header;
+
+    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
+    if (ret != sizeof(s->header)) {
+        goto fail;
+    }
+
+    add_cow_header_le_to_cpu(&le_header, &s->header);
+
+    if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
+        ret = -EINVAL;
+        goto fail;
+    }
+
+    if (s->header.version != ADD_COW_VERSION) {
+        char version[64];
+        snprintf(version, sizeof(version), "ADD-COW version %d",
+            s->header.version);
+        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+            bs->device_name, "add-cow", version);
+        ret = -ENOTSUP;
+        goto fail;
+    }
+
+    if (s->header.features & ~ADD_COW_FEATURE_MASK) {
+        char buf[64];
+        snprintf(buf, sizeof(buf), "%" PRIx64,
+            s->header.features & ~ADD_COW_FEATURE_MASK);
+        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+            bs->device_name, "add-cow", buf);
+        return -ENOTSUP;
+    }
+
+    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
+        ret = bdrv_read_string(bs->file, sizeof(s->header),
+            sizeof(s->backing_file_format) - 1, s->backing_file_format,
+            sizeof(s->backing_file_format));
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
+    ret = bdrv_read_string(bs->file,
+            sizeof(s->header) + sizeof(s->image_file_format),
+            sizeof(s->image_file_format) - 1, s->image_file_format,
+            sizeof(s->image_file_format));
+    if (ret < 0) {
+        goto fail;
+    }
+
+    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
+        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
+                          s->header.backing_filename_size, bs->backing_file,
+                          sizeof(bs->backing_file));
+        if (ret < 0) {
+            goto fail;
+        }
+    }
+
+    ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
+                      s->header.image_filename_size, tmp_name,
+                      sizeof(tmp_name));
+    if (ret < 0) {
+        goto fail;
+    }
+
+    s->image_hd = bdrv_new("");
+    if (path_has_protocol(image_filename)) {
+        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
+    } else {
+        path_combine(image_filename, sizeof(image_filename),
+                     bs->filename, tmp_name);
+    }
+
+    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
+    if (ret < 0) {
+        bdrv_delete(s->image_hd);
+        goto fail;
+    }
+
+    bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
+    s->cluster_size = ADD_COW_CLUSTER_SIZE;
+    sector_per_byte = SECTORS_PER_CLUSTER * 8;
+    s->bitmap_size =
+        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
+    s->bitmap_cache =
+        block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
+
+    qemu_co_mutex_init(&s->lock);
+    return 0;
+fail:
+    if (s->bitmap_cache) {
+        block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
+    }
+    return ret;
+}
+
+static void add_cow_close(BlockDriverState *bs)
+{
+    BDRVAddCowState *s = bs->opaque;
+    block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
+    bdrv_delete(s->image_hd);
+}
+
+static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
+{
+    BDRVAddCowState *s  = bs->opaque;
+    BlockCache *c = s->bitmap_cache;
+    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
+    uint8_t *table      = NULL;
+    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
+        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
+    int ret = block_cache_get(bs, s->bitmap_cache, offset,
+        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
+
+    if (ret < 0) {
+        return ret;
+    }
+    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
+        & (1 << (cluster_num % 8));
+}
+
+static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
+        int64_t sector_num, int nb_sectors, int *num_same)
+{
+    BDRVAddCowState *s = bs->opaque;
+    int changed;
+
+    if (nb_sectors == 0) {
+        *num_same = 0;
+        return 0;
+    }
+
+    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
+        *num_same = nb_sectors - 1;
+        return 1;
+    }
+    changed = is_allocated(bs, sector_num);
+
+    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
+        if (is_allocated(bs, sector_num + *num_same) != changed) {
+            break;
+        }
+    }
+    return changed;
+}
+
+static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
+                  int64_t sector_num, int nb_sectors)
+{
+    int n1;
+    if ((sector_num + nb_sectors) <= bs->total_sectors) {
+        return nb_sectors;
+    }
+    if (sector_num >= bs->total_sectors) {
+        n1 = 0;
+    } else {
+        n1 = bs->total_sectors - sector_num;
+    }
+
+    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
+        0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
+
+    return n1;
+}
+
+static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
+    int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
+{
+    BDRVAddCowState *s  = bs->opaque;
+    int cur_nr_sectors;
+    uint64_t bytes_done = 0;
+    QEMUIOVector hd_qiov;
+    int n, n1, ret = 0;
+
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    qemu_co_mutex_lock(&s->lock);
+    while (remaining_sectors != 0) {
+        cur_nr_sectors = remaining_sectors;
+        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
+            cur_nr_sectors = n;
+            qemu_iovec_reset(&hd_qiov);
+            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
+                            cur_nr_sectors * BDRV_SECTOR_SIZE);
+            qemu_co_mutex_unlock(&s->lock);
+            ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
+            qemu_co_mutex_lock(&s->lock);
+            if (ret < 0) {
+                goto fail;
+            }
+        } else {
+            cur_nr_sectors = n;
+            if (bs->backing_hd) {
+                qemu_iovec_reset(&hd_qiov);
+                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
+                            cur_nr_sectors * BDRV_SECTOR_SIZE);
+                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
+                    sector_num, cur_nr_sectors);
+                if (n1 > 0) {
+                    qemu_co_mutex_unlock(&s->lock);
+                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
+                                        n, &hd_qiov);
+                    qemu_co_mutex_lock(&s->lock);
+                    if (ret < 0) {
+                        goto fail;
+                    }
+                }
+            } else {
+                qemu_iovec_memset(&hd_qiov, 0, 0,
+                    BDRV_SECTOR_SIZE * cur_nr_sectors);
+            }
+        }
+        remaining_sectors -= cur_nr_sectors;
+        sector_num += cur_nr_sectors;
+        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
+    }
+fail:
+    qemu_co_mutex_unlock(&s->lock);
+    qemu_iovec_destroy(&hd_qiov);
+    return ret;
+}
+
+static int coroutine_fn copy_sectors(BlockDriverState *bs,
+                                     int n_start, int n_end)
+{
+    BDRVAddCowState *s = bs->opaque;
+    QEMUIOVector qiov;
+    struct iovec iov;
+    int n, ret;
+
+    n = n_end - n_start;
+    if (n <= 0) {
+        return 0;
+    }
+
+    iov.iov_len = n * BDRV_SECTOR_SIZE;
+    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
+
+    qemu_iovec_init_external(&qiov, &iov, 1);
+
+    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
+    if (ret < 0) {
+        goto out;
+    }
+    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
+    if (ret < 0) {
+        goto out;
+    }
+
+    ret = 0;
+out:
+    qemu_vfree(iov.iov_base);
+    return ret;
+}
+
+static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
+        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
+{
+    BDRVAddCowState *s = bs->opaque;
+    BlockCache *c = s->bitmap_cache;
+    int ret = 0, i;
+    QEMUIOVector hd_qiov;
+    uint8_t *table;
+    uint64_t offset;
+
+    qemu_co_mutex_lock(&s->lock);
+    qemu_iovec_init(&hd_qiov, qiov->niov);
+    ret = bdrv_co_writev(s->image_hd,
+                     sector_num,
+                     remaining_sectors, qiov);
+
+    if (ret < 0) {
+        goto fail;
+    }
+    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
+        /* Copy content of unmodified sectors */
+        if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
+            ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
+                sector_num);
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+
+        if (!is_cluster_tail(sector_num + remaining_sectors - 1)
+            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
+            ret = copy_sectors(bs, sector_num + remaining_sectors,
+                ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
+            if (ret < 0) {
+                goto fail;
+            }
+        }
+
+        for (i = sector_num / SECTORS_PER_CLUSTER;
+            i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
+            i++) {
+            offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
+                + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
+            ret = block_cache_get(bs, s->bitmap_cache, offset,
+                (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
+            if (ret < 0) {
+                goto fail;
+            }
+            if ((table[i / 8] & (1 << (i % 8))) == 0) {
+                table[i / 8] |= (1 << (i % 8));
+                block_cache_entry_mark_dirty(s->bitmap_cache, table);
+            }
+        }
+    }
+    ret = 0;
+fail:
+    qemu_co_mutex_unlock(&s->lock);
+    qemu_iovec_destroy(&hd_qiov);
+    return ret;
+}
+
+static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
+{
+    BDRVAddCowState *s = bs->opaque;
+    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
+    int ret;
+    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
+    int64_t bitmap_size =
+        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
+    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
+        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
+
+    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
+    if (ret < 0) {
+        return ret;
+    }
+    return 0;
+}
+
+static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
+{
+    BDRVAddCowState *s = bs->opaque;
+    int ret;
+
+    qemu_co_mutex_lock(&s->lock);
+    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
+        ADD_COW_CACHE_ENTRY_SIZE);
+    qemu_co_mutex_unlock(&s->lock);
+    return ret;
+}
+
+static QEMUOptionParameter add_cow_create_options[] = {
+    {
+        .name = BLOCK_OPT_SIZE,
+        .type = OPT_SIZE,
+        .help = "Virtual disk size"
+    },
+    {
+        .name = BLOCK_OPT_BACKING_FILE,
+        .type = OPT_STRING,
+        .help = "File name of a base image"
+    },
+    {
+        .name = BLOCK_OPT_BACKING_FMT,
+        .type = OPT_STRING,
+        .help = "Image format of the base image"
+    },
+    {
+        .name = BLOCK_OPT_IMAGE_FILE,
+        .type = OPT_STRING,
+        .help = "File name of a image file"
+    },
+    {
+        .name = BLOCK_OPT_IMAGE_FORMAT,
+        .type = OPT_STRING,
+        .help = "Image format of the image file"
+    },
+    { NULL }
+};
+
+static BlockDriver bdrv_add_cow = {
+    .format_name                = "add-cow",
+    .instance_size              = sizeof(BDRVAddCowState),
+    .bdrv_probe                 = add_cow_probe,
+    .bdrv_open                  = add_cow_open,
+    .bdrv_close                 = add_cow_close,
+    .bdrv_create                = add_cow_create,
+    .bdrv_co_readv              = add_cow_co_readv,
+    .bdrv_co_writev             = add_cow_co_writev,
+    .bdrv_truncate              = bdrv_add_cow_truncate,
+    .bdrv_co_is_allocated       = add_cow_is_allocated,
+
+    .create_options             = add_cow_create_options,
+    .bdrv_co_flush_to_os        = add_cow_co_flush,
+};
+
+static void bdrv_add_cow_init(void)
+{
+    bdrv_register(&bdrv_add_cow);
+}
+
+block_init(bdrv_add_cow_init);
diff --git a/block/add-cow.h b/block/add-cow.h
new file mode 100644
index 0000000..f058376
--- /dev/null
+++ b/block/add-cow.h
@@ -0,0 +1,85 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef BLOCK_ADD_COW_H
+#define BLOCK_ADD_COW_H
+#include "block-cache.h"
+
+enum {
+    ADD_COW_F_All_ALLOCATED     = 0X01,
+    ADD_COW_FEATURE_MASK        = ADD_COW_F_All_ALLOCATED,
+
+    ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
+                    ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
+                    ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
+                    ((uint64_t)'W' << 8) | 0xFF),
+    ADD_COW_VERSION             = 1,
+    ADD_COW_FILE_LEN            = 1024,
+    ADD_COW_CACHE_SIZE          = 16,
+    ADD_COW_CACHE_ENTRY_SIZE    = 65536,
+    ADD_COW_CLUSTER_SIZE        = 65536,
+    SECTORS_PER_CLUSTER         = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
+    ADD_COW_PAGE_SIZE           = 4096,
+    ADD_COW_DEFAULT_PAGE_SIZE   = 1,
+};
+
+typedef struct AddCowHeader {
+    uint64_t        magic;
+    uint32_t        version;
+
+    uint32_t        backing_filename_offset;
+    uint32_t        backing_filename_size;
+
+    uint32_t        image_filename_offset;
+    uint32_t        image_filename_size;
+
+    uint64_t        features;
+    uint64_t        optional_features;
+    uint32_t        header_pages_size;
+} QEMU_PACKED AddCowHeader;
+
+typedef struct BDRVAddCowState {
+    BlockDriverState    *image_hd;
+    CoMutex             lock;
+    int                 cluster_size;
+    BlockCache         *bitmap_cache;
+    uint64_t            bitmap_size;
+    AddCowHeader        header;
+    char                backing_file_format[16];
+    char                image_file_format[16];
+} BDRVAddCowState;
+
+/* Convert sector_num to offset in bitmap */
+static inline int64_t offset_in_bitmap(int64_t sector_num)
+{
+    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
+    return cluster_num / 8;
+}
+
+static inline bool is_cluster_head(int64_t sector_num)
+{
+    return sector_num % SECTORS_PER_CLUSTER == 0;
+}
+
+static inline bool is_cluster_tail(int64_t sector_num)
+{
+    return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
+}
+
+BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
+int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
+void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
+int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
+    void **table);
+int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
+#endif
diff --git a/block_int.h b/block_int.h
index 6c1d9ca..67954ec 100644
--- a/block_int.h
+++ b/block_int.h
@@ -53,6 +53,8 @@
 #define BLOCK_OPT_SUBFMT            "subformat"
 #define BLOCK_OPT_COMPAT_LEVEL      "compat"
 #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
+#define BLOCK_OPT_IMAGE_FILE        "image_file"
+#define BLOCK_OPT_IMAGE_FORMAT      "image_format"
 
 typedef struct BdrvTrackedRequest BdrvTrackedRequest;
 
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
                   ` (4 preceding siblings ...)
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
  2012-09-11  9:55   ` Kevin Wolf
  2012-08-23  5:34 ` [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
  6 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

Add qemu-iotests support for add-cow.

Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
 tests/qemu-iotests/017       |    2 +-
 tests/qemu-iotests/020       |    2 +-
 tests/qemu-iotests/check     |    4 ++--
 tests/qemu-iotests/common    |    6 ++++++
 tests/qemu-iotests/common.rc |   19 +++++++++++++++++++
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
index 66951eb..d31432f 100755
--- a/tests/qemu-iotests/017
+++ b/tests/qemu-iotests/017
@@ -40,7 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 2fb0ff8..3dbb495 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
 . ./common.pattern
 
 # Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
 _supported_proto generic
 _supported_os Linux
 
diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 432732c..122267b 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -243,7 +243,7 @@ do
 		echo " - no qualified output"
 		err=true
 	    else
-		if diff -w $seq.out $tmp.out >/dev/null 2>&1
+        if diff -w -I "^Formatting" $seq.out $tmp.out >/dev/null 2>&1
 		then
 		    echo ""
 		    if $err
@@ -255,7 +255,7 @@ do
 		else
 		    echo " - output mismatch (see $seq.out.bad)"
 		    mv $tmp.out $seq.out.bad
-		    $diff -w $seq.out $seq.out.bad
+            $diff -w -I "^Formatting" $seq.out $seq.out.bad
 		    err=true
 		fi
 	    fi
diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 1f6fdf5..1c81b09 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -128,6 +128,7 @@ common options
 check options
     -raw                test raw (default)
     -cow                test cow
+    -add-cow            test add-cow
     -qcow               test qcow
     -qcow2              test qcow2
     -qed                test qed
@@ -163,6 +164,11 @@ testlist options
 	    xpand=false
 	    ;;
 
+    -add-cow)
+        IMGFMT=add-cow
+        xpand=false
+        ;;
+
 	-qcow)
 	    IMGFMT=qcow
 	    xpand=false
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 7782808..ec5afd7 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -97,6 +97,18 @@ _make_test_img()
     fi
     if [ \( "$IMGFMT" = "qcow2" -o "$IMGFMT" = "qed" \) -a -n "$CLUSTER_SIZE" ]; then
         optstr=$(_optstr_add "$optstr" "cluster_size=$CLUSTER_SIZE")
+    elif [ "$IMGFMT" = "add-cow" ]; then
+        local BACKING="$TEST_IMG"".qcow2"
+        local IMG="$TEST_IMG"".raw"
+        if [ "$1" = "-b" ]; then
+            IMG="$IMG"".b"
+            $QEMU_IMG create -f raw $IMG $image_size>/dev/null
+            extra_img_options="-o image_file=$IMG $extra_img_options"
+        else
+            $QEMU_IMG create -f raw $IMG $image_size>/dev/null
+            $QEMU_IMG create -f qcow2 $BACKING $image_size>/dev/null
+            extra_img_options="-o backing_file=$BACKING,image_file=$IMG"
+        fi
     fi
 
     if [ -n "$optstr" ]; then
@@ -125,6 +137,13 @@ _cleanup_test_img()
             rm -f $TEST_DIR/t.$IMGFMT
             rm -f $TEST_DIR/t.$IMGFMT.orig
             rm -f $TEST_DIR/t.$IMGFMT.base
+            if [ "$IMGFMT" = "add-cow" ]; then
+                rm -f $TEST_DIR/t.$IMGFMT.qcow2
+                rm -f $TEST_DIR/t.$IMGFMT.raw
+                rm -f $TEST_DIR/t.$IMGFMT.raw.b
+                rm -f $TEST_DIR/t.$IMGFMT.ct.qcow2
+                rm -f $TEST_DIR/t.$IMGFMT.ct.raw
+            fi
             ;;
 
         rbd)
-- 
1.7.1

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 0/6] add-cow file format
  2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
                   ` (5 preceding siblings ...)
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
@ 2012-08-23  5:34 ` Dong Xu Wang
  6 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-23  5:34 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, Dong Xu Wang

Anyone can give me some comments? That will be very grateful..

On Fri, Aug 10, 2012 at 11:39 PM, Dong Xu Wang
<wdongxu@linux.vnet.ibm.com> wrote:
> This will introduce a new file format: add-cow.
>
> add-cow can benefit from other available functions, such as path_has_protocol and
> qed_read_string, so we will make them public.
>
> Now add-cow is still using QEMUOptionParameter, not QemuOpts,  I will send a
> separate patch series to convert.
>
> snapshot_blkdev are not supported now for add-cow, after converting QEMUOptionParameter
> to QemuOpts, will add related code.
>
>
> v11->v12:
> 1) Removed un-used feature bit.
> 2) Share cache code with qcow2.c.
> 3) Remove snapshot_blkdev support, will add it in another patch.
> 5) COW Bitmap field in add-cow file will be multiple of 65536.
> 6) fix grammer and typo.
>
> Dong Xu Wang (6):
>   docs: document for add cow file format
>   make path_has_protocol non-static
>   qed_read_string to bdrv_read_string
>   rename qcow2-cache.c to block-cache.c
>   add-cow file format
>   qemu-iotests
>
>  block.c                      |   29 ++-
>  block.h                      |    6 +
>  block/Makefile.objs          |    4 +-
>  block/add-cow.c              |  613 ++++++++++++++++++++++++++++++++++++++++++
>  block/add-cow.h              |   85 ++++++
>  block/qcow2-cache.c          |  323 ----------------------
>  block/qcow2-cluster.c        |   66 +++--
>  block/qcow2-refcount.c       |   66 +++--
>  block/qcow2.c                |   36 ++--
>  block/qcow2.h                |   24 +--
>  block/qed.c                  |   29 +--
>  block_int.h                  |    2 +
>  docs/specs/add-cow.txt       |  123 +++++++++
>  tests/qemu-iotests/017       |    2 +-
>  tests/qemu-iotests/020       |    2 +-
>  tests/qemu-iotests/check     |    4 +-
>  tests/qemu-iotests/common    |    6 +
>  tests/qemu-iotests/common.rc |   19 ++
>  trace-events                 |   13 +-
>  19 files changed, 994 insertions(+), 458 deletions(-)
>  create mode 100644 block/add-cow.c
>  create mode 100644 block/add-cow.h
>  delete mode 100644 block/qcow2-cache.c
>  create mode 100644 docs/specs/add-cow.txt
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
@ 2012-09-06 17:27   ` Michael Roth
  2012-09-10  1:48     ` Dong Xu Wang
  2012-09-10 15:23   ` Kevin Wolf
  1 sibling, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:27 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, qemu-devel

On Fri, Aug 10, 2012 at 11:39:40PM +0800, Dong Xu Wang wrote:
> Document for add-cow format, the usage and spec of add-cow are introduced.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  docs/specs/add-cow.txt |  123 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 123 insertions(+), 0 deletions(-)
>  create mode 100644 docs/specs/add-cow.txt
> 
> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
> new file mode 100644
> index 0000000..d5a7a68
> --- /dev/null
> +++ b/docs/specs/add-cow.txt
> @@ -0,0 +1,123 @@
> +== General ==
> +
> +The raw file format does not support backing files or copy on write feature.
> +The add-cow image format makes it possible to use backing files with raw
> +image by keeping a separate .add-cow metadata file. Once all sectors
> +have been written into the raw image it is safe to discard the .add-cow
> +and backing files, then we can use the raw image directly.
> +
> +An example usage of add-cow would look like::
> +(ubuntu.img is a disk image which has been installed OS.)
> +    1)  Create a raw image with the same size of ubuntu.img
> +            qemu-img create -f raw test.raw 8G
> +    2)  Create an add-cow image which will store dirty bitmap
> +            qemu-img create -f add-cow test.add-cow \
> +                -o backing_file=ubuntu.img,image_file=test.raw
> +    3)  Run qemu with add-cow image
> +            qemu -drive if=virtio,file=test.add-cow
> +
> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
> +will be calculated from the size of test.raw.
> +
> +=Specification=
> +
> +The file format looks like this:
> +
> + +---------------+-------------+-----------------+
> + |     Header    |   Reserved  |    COW bitmap   |
> + +---------------+-------------+-----------------+
> +
> +All numbers in add-cow are stored in Little Endian byte order.
> +
> +== Header ==
> +
> +The Header is included in the first bytes:
> +(#define HEADER_SIZE (4096 * header_pages_size))
> +    Byte    0 -  7:     magic
> +                        add-cow magic string ("ADD_COW\xff").
> +
> +            8 -  11:    version
> +                        Version number (only valid value is 1 now).
> +
> +            12 - 15:    backing file name offset
> +                        Offset in the add-cow file at which the backing file
> +                        name is stored (NB: The string is not nul-terminated).
> +                        If backing file name does NOT exist, this field will be
> +                        0. Must be between 80 and [HEADER_SIZE - 2](a file name
> +                        must be at least 1 byte).
> +
> +            16 - 19:    backing file name size
> +                        Length of the backing file name in bytes. It will be 0
> +                        if the backing file name offset is 0. If backing file
> +                        name offset is non-zero, then it must be non-zero. Must
> +                        be less than [HEADER_SIZE - 80] to fit in the reserved
> +                        part of the header.
> +
> +            20 - 23:    image file name offset
> +                        Offset in the add-cow file at which the image file name
> +                        is stored (NB: The string is not null terminated). It
> +                        must be between 80 and [HEADER_SIZE - 2].
> +
> +            24 - 27:    image file name size
> +                        Length of the image file name in bytes.
> +                        Must be less than [HEADER_SIZE - 80] to fit in the reserved
> +                        part of the header.
> +
> +            28 - 35:    features
> +                        Currently only 1 feature bit is used:
> +                        Feature bits:
> +                            * ADD_COW_F_All_ALLOCATED   = 0x01.
> +
> +            36 - 43:    optional features
> +                        Not used now. Reserved for future use. It must be set to 0.
> +
> +            44 - 47:    header pages size
> +                        The header field is variable-sized. This field indicates
> +                        how many pages(4k) will be used to store add-cow header.
> +                        In add-cow v1, it is fixed to 1, so the header size will
> +                        be 4k * 1 = 4096 bytes.
> +
> +            48 - 63:    backing file format
> +                        format of backing file. It will be filled with 0 if
> +                        backing file name offset is 0. If backing file name
> +                        offset is non-zero, it must be non-zero. It is coded
> +                        in free-form ASCII, and is not NUL-terminated.
> +
> +            64 - 79:    image file format
> +                        format of image file. It must be non-zero. It is coded
> +                        in free-form ASCII, and is not NUL-terminated.
> +
> +            80 - [HEADER_SIZE - 1]:
> +                        It is used to make sure COW bitmap field starts at the
> +                        HEADER_SIZE byte, backing file name and image file name
> +                        will be stored here. The bytes that is not pointing to
> +                        backing file and image file names will bet set to 0.
> +
> +== COW bitmap ==
> +
> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
> +backing file and image file. The bitmap will track whether the sector in
> +backing file is dirty or not.
> +
> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
> +sectors, then each bit indicates 512 * 128 = 64k bytes. the size of bitmap is
> +calculated according to virtual size of image file, and it also should be multipe
> +of 65536, the bits not used will be set to 0. Within each byte, the least
> +significant bit covers the first cluster. Bit orders in one byte look like:
> + +----+----+----+----+----+----+----+----+
> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
> + +----+----+----+----+----+----+----+----+
> +
> +If the bit is 0, indicates the sector has not been allocated in image file, data
> +should be loaded from backing file while reading; if the bit is 1, indicates the
> +related sector has been dirty, should be loaded from image file while reading.
> +Writing to a sector causes the corresponding bit to be set to 1.
> +
> +If raw image is not an even multiple of cluster bytes, bits that correspond to
> +bytes beyond the raw file size in add-cow will be 0.
> +
> +Image file name and backing file name must NOT be the same, we prevent this
> +while creating add-cow files.
> +
> +Image file and backing file are interpreted relative to the qcow2 file, not

Relative to the add-cow file?

> +to the current working directory of the process that opened the qcow2 file.
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
@ 2012-09-06 17:27   ` Michael Roth
  0 siblings, 0 replies; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:27 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, qemu-devel

On Fri, Aug 10, 2012 at 11:39:41PM +0800, Dong Xu Wang wrote:
> We will use path_has_protocol outside block.c, so just make it public.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>

Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>

> ---
>  block.c |    2 +-
>  block.h |    1 +
>  2 files changed, 2 insertions(+), 1 deletions(-)
> 
> diff --git a/block.c b/block.c
> index 24323c1..c13d803 100644
> --- a/block.c
> +++ b/block.c
> @@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
>  }
> 
>  /* check if the path starts with "<protocol>:" */
> -static int path_has_protocol(const char *path)
> +int path_has_protocol(const char *path)
>  {
>      const char *p;
> 
> diff --git a/block.h b/block.h
> index 650d872..54e61c9 100644
> --- a/block.h
> +++ b/block.h
> @@ -307,6 +307,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn);
> 
>  char *get_human_readable_size(char *buf, int buf_size, int64_t size);
>  int path_is_absolute(const char *path);
> +int path_has_protocol(const char *path);
>  void path_combine(char *dest, int dest_size,
>                    const char *base_path,
>                    const char *filename);
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
@ 2012-09-06 17:32   ` Michael Roth
  2012-09-10  1:49     ` Dong Xu Wang
  0 siblings, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:32 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, qemu-devel

On Fri, Aug 10, 2012 at 11:39:42PM +0800, Dong Xu Wang wrote:
> Make qed_read_string function to a common interface, so move it to block.c.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  block.c     |   27 +++++++++++++++++++++++++++
>  block.h     |    2 ++
>  block/qed.c |   29 +----------------------------
>  3 files changed, 30 insertions(+), 28 deletions(-)
> 
> diff --git a/block.c b/block.c
> index c13d803..d906b35 100644
> --- a/block.c
> +++ b/block.c
> @@ -213,6 +213,33 @@ int path_has_protocol(const char *path)
>      return *p == ':';
>  }
> 
> +/**
> + * Read a string of known length from the image file
> + *
> + * @bs:         Image file
> + * @offset:     File offset to start of string, in bytes
> + * @n:          String length in bytes
> + * @buf:        Destination buffer
> + * @buflen:     Destination buffer length in bytes
> + * @ret:        0 on success, -errno on failure
> + *
> + * The string is NUL-terminated.
> + */
> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
> +                           char *buf, size_t buflen)

Small alignment issue   ^

> +{
> +    int ret;
> +    if (n >= buflen) {
> +        return -EINVAL;
> +    }
> +    ret = bdrv_pread(bs, offset, buf, n);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    buf[n] = '\0';
> +    return 0;
> +}
> +
>  int path_is_absolute(const char *path)
>  {
>  #ifdef _WIN32
> diff --git a/block.h b/block.h
> index 54e61c9..e5dfcd7 100644
> --- a/block.h
> +++ b/block.h
> @@ -154,6 +154,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
>      const void *buf, int count);
>  int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
>      int nb_sectors, QEMUIOVector *qiov);
> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
> +    char *buf, size_t buflen);

Another one here        ^

>  int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
>      int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
>  int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
> diff --git a/block/qed.c b/block/qed.c
> index 5f3eefa..311c589 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
>  }
> 
>  /**
> - * Read a string of known length from the image file
> - *
> - * @file:       Image file
> - * @offset:     File offset to start of string, in bytes
> - * @n:          String length in bytes
> - * @buf:        Destination buffer
> - * @buflen:     Destination buffer length in bytes
> - * @ret:        0 on success, -errno on failure
> - *
> - * The string is NUL-terminated.
> - */
> -static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
> -                           char *buf, size_t buflen)
> -{
> -    int ret;
> -    if (n >= buflen) {
> -        return -EINVAL;
> -    }
> -    ret = bdrv_pread(file, offset, buf, n);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -    buf[n] = '\0';
> -    return 0;
> -}
> -
> -/**
>   * Allocate new clusters
>   *
>   * @s:          QED state
> @@ -437,7 +410,7 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>              return -EINVAL;
>          }
> 
> -        ret = qed_read_string(bs->file, s->header.backing_filename_offset,
> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>                                s->header.backing_filename_size, bs->backing_file,
>                                sizeof(bs->backing_file));

Here too                          ^

Looks good otherwise.

>          if (ret < 0) {
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
@ 2012-09-06 17:52   ` Michael Roth
  2012-09-10  2:14     ` Dong Xu Wang
  2012-09-11  8:41   ` Kevin Wolf
  1 sibling, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:52 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, qemu-devel

On Fri, Aug 10, 2012 at 11:39:43PM +0800, Dong Xu Wang wrote:
> add-cow and qcow2 file format will share the same cache code, so rename
> block-cache.c to block-cache.c. And related structure and qcow2 code also

"qcow2-cache.c to block-cache.c"

But I've scanned through the rest of your patches and can't seem to find
where block-cache.c gets introduced. Did you forget to git add it?

> are changed.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  block.h                |    3 +
>  block/Makefile.objs    |    3 +-
>  block/qcow2-cache.c    |  323 ------------------------------------------------
>  block/qcow2-cluster.c  |   66 ++++++----
>  block/qcow2-refcount.c |   66 ++++++-----
>  block/qcow2.c          |   36 +++---
>  block/qcow2.h          |   24 +---
>  trace-events           |   13 +-
>  8 files changed, 109 insertions(+), 425 deletions(-)
>  delete mode 100644 block/qcow2-cache.c
> 
> diff --git a/block.h b/block.h
> index e5dfcd7..c325661 100644
> --- a/block.h
> +++ b/block.h
> @@ -401,6 +401,9 @@ typedef enum {
>      BLKDBG_CLUSTER_ALLOC_BYTES,
>      BLKDBG_CLUSTER_FREE,
> 
> +    BLKDBG_ADD_COW_UPDATE,
> +    BLKDBG_ADD_COW_LOAD,
> +
>      BLKDBG_EVENT_MAX,
>  } BlkDebugEvent;
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index b5754d3..23bdfc8 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -1,7 +1,8 @@
>  block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
> +block-obj-y += block-cache.o
>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>  block-obj-y += stream.o
>  block-obj-$(CONFIG_WIN32) += raw-win32.o
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> deleted file mode 100644
> index 2d4322a..0000000
> --- a/block/qcow2-cache.c
> +++ /dev/null
> @@ -1,323 +0,0 @@
> -/*
> - * L2/refcount table cache for the QCOW2 format
> - *
> - * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a copy
> - * of this software and associated documentation files (the "Software"), to deal
> - * in the Software without restriction, including without limitation the rights
> - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> - * copies of the Software, and to permit persons to whom the Software is
> - * furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice shall be included in
> - * all copies or substantial portions of the Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> - * THE SOFTWARE.
> - */
> -
> -#include "block_int.h"
> -#include "qemu-common.h"
> -#include "qcow2.h"
> -#include "trace.h"
> -
> -typedef struct Qcow2CachedTable {
> -    void*   table;
> -    int64_t offset;
> -    bool    dirty;
> -    int     cache_hits;
> -    int     ref;
> -} Qcow2CachedTable;
> -
> -struct Qcow2Cache {
> -    Qcow2CachedTable*       entries;
> -    struct Qcow2Cache*      depends;
> -    int                     size;
> -    bool                    depends_on_flush;
> -};
> -
> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    Qcow2Cache *c;
> -    int i;
> -
> -    c = g_malloc0(sizeof(*c));
> -    c->size = num_tables;
> -    c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
> -
> -    for (i = 0; i < c->size; i++) {
> -        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
> -    }
> -
> -    return c;
> -}
> -
> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
> -{
> -    int i;
> -
> -    for (i = 0; i < c->size; i++) {
> -        assert(c->entries[i].ref == 0);
> -        qemu_vfree(c->entries[i].table);
> -    }
> -
> -    g_free(c->entries);
> -    g_free(c);
> -
> -    return 0;
> -}
> -
> -static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
> -{
> -    int ret;
> -
> -    ret = qcow2_cache_flush(bs, c->depends);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    c->depends = NULL;
> -    c->depends_on_flush = false;
> -
> -    return 0;
> -}
> -
> -static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int ret = 0;
> -
> -    if (!c->entries[i].dirty || !c->entries[i].offset) {
> -        return 0;
> -    }
> -
> -    trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
> -                                  c == s->l2_table_cache, i);
> -
> -    if (c->depends) {
> -        ret = qcow2_cache_flush_dependency(bs, c);
> -    } else if (c->depends_on_flush) {
> -        ret = bdrv_flush(bs->file);
> -        if (ret >= 0) {
> -            c->depends_on_flush = false;
> -        }
> -    }
> -
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    if (c == s->refcount_block_cache) {
> -        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
> -    } else if (c == s->l2_table_cache) {
> -        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
> -    }
> -
> -    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
> -        s->cluster_size);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    c->entries[i].dirty = false;
> -
> -    return 0;
> -}
> -
> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int result = 0;
> -    int ret;
> -    int i;
> -
> -    trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
> -
> -    for (i = 0; i < c->size; i++) {
> -        ret = qcow2_cache_entry_flush(bs, c, i);
> -        if (ret < 0 && result != -ENOSPC) {
> -            result = ret;
> -        }
> -    }
> -
> -    if (result == 0) {
> -        ret = bdrv_flush(bs->file);
> -        if (ret < 0) {
> -            result = ret;
> -        }
> -    }
> -
> -    return result;
> -}
> -
> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
> -    Qcow2Cache *dependency)
> -{
> -    int ret;
> -
> -    if (dependency->depends) {
> -        ret = qcow2_cache_flush_dependency(bs, dependency);
> -        if (ret < 0) {
> -            return ret;
> -        }
> -    }
> -
> -    if (c->depends && (c->depends != dependency)) {
> -        ret = qcow2_cache_flush_dependency(bs, c);
> -        if (ret < 0) {
> -            return ret;
> -        }
> -    }
> -
> -    c->depends = dependency;
> -    return 0;
> -}
> -
> -void qcow2_cache_depends_on_flush(Qcow2Cache *c)
> -{
> -    c->depends_on_flush = true;
> -}
> -
> -static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
> -{
> -    int i;
> -    int min_count = INT_MAX;
> -    int min_index = -1;
> -
> -
> -    for (i = 0; i < c->size; i++) {
> -        if (c->entries[i].ref) {
> -            continue;
> -        }
> -
> -        if (c->entries[i].cache_hits < min_count) {
> -            min_index = i;
> -            min_count = c->entries[i].cache_hits;
> -        }
> -
> -        /* Give newer hits priority */
> -        /* TODO Check how to optimize the replacement strategy */
> -        c->entries[i].cache_hits /= 2;
> -    }
> -
> -    if (min_index == -1) {
> -        /* This can't happen in current synchronous code, but leave the check
> -         * here as a reminder for whoever starts using AIO with the cache */
> -        abort();
> -    }
> -    return min_index;
> -}
> -
> -static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
> -    uint64_t offset, void **table, bool read_from_disk)
> -{
> -    BDRVQcowState *s = bs->opaque;
> -    int i;
> -    int ret;
> -
> -    trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
> -                          offset, read_from_disk);
> -
> -    /* Check if the table is already cached */
> -    for (i = 0; i < c->size; i++) {
> -        if (c->entries[i].offset == offset) {
> -            goto found;
> -        }
> -    }
> -
> -    /* If not, write a table back and replace it */
> -    i = qcow2_cache_find_entry_to_replace(c);
> -    trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
> -                                        c == s->l2_table_cache, i);
> -    if (i < 0) {
> -        return i;
> -    }
> -
> -    ret = qcow2_cache_entry_flush(bs, c, i);
> -    if (ret < 0) {
> -        return ret;
> -    }
> -
> -    trace_qcow2_cache_get_read(qemu_coroutine_self(),
> -                               c == s->l2_table_cache, i);
> -    c->entries[i].offset = 0;
> -    if (read_from_disk) {
> -        if (c == s->l2_table_cache) {
> -            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
> -        }
> -
> -        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
> -        if (ret < 0) {
> -            return ret;
> -        }
> -    }
> -
> -    /* Give the table some hits for the start so that it won't be replaced
> -     * immediately. The number 32 is completely arbitrary. */
> -    c->entries[i].cache_hits = 32;
> -    c->entries[i].offset = offset;
> -
> -    /* And return the right table */
> -found:
> -    c->entries[i].cache_hits++;
> -    c->entries[i].ref++;
> -    *table = c->entries[i].table;
> -
> -    trace_qcow2_cache_get_done(qemu_coroutine_self(),
> -                               c == s->l2_table_cache, i);
> -
> -    return 0;
> -}
> -
> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> -    void **table)
> -{
> -    return qcow2_cache_do_get(bs, c, offset, table, true);
> -}
> -
> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> -    void **table)
> -{
> -    return qcow2_cache_do_get(bs, c, offset, table, false);
> -}
> -
> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
> -{
> -    int i;
> -
> -    for (i = 0; i < c->size; i++) {
> -        if (c->entries[i].table == *table) {
> -            goto found;
> -        }
> -    }
> -    return -ENOENT;
> -
> -found:
> -    c->entries[i].ref--;
> -    *table = NULL;
> -
> -    assert(c->entries[i].ref >= 0);
> -    return 0;
> -}
> -
> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
> -{
> -    int i;
> -
> -    for (i = 0; i < c->size; i++) {
> -        if (c->entries[i].table == table) {
> -            goto found;
> -        }
> -    }
> -    abort();
> -
> -found:
> -    c->entries[i].dirty = true;
> -}
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index e179211..335dc7a 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "block/qcow2.h"
>  #include "trace.h"
> +#include "block-cache.h"
> 
>  int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>  {
> @@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>          return new_l1_table_offset;
>      }
> 
> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> +    ret = block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);
>      if (ret < 0) {
>          goto fail;
>      }
> @@ -119,7 +121,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
>      BDRVQcowState *s = bs->opaque;
>      int ret;
> 
> -    ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
> +    ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
> +        (void **) l2_table, BLOCK_TABLE_L2, s->cluster_size);
> 
>      return ret;
>  }
> @@ -180,7 +183,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>          return l2_offset;
>      }
> 
> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> +    ret = block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);
>      if (ret < 0) {
>          goto fail;
>      }
> @@ -188,7 +192,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>      /* allocate a new entry in the l2 cache */
> 
>      trace_qcow2_l2_allocate_get_empty(bs, l1_index);
> -    ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
> +    ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
> +        (void **) table, BLOCK_TABLE_L2, s->cluster_size);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -203,16 +208,17 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
> 
>          /* if there was an old l2 table, read it from the disk */
>          BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
> -        ret = qcow2_cache_get(bs, s->l2_table_cache,
> +        ret = block_cache_get(bs, s->l2_table_cache,
>              old_l2_offset & L1E_OFFSET_MASK,
> -            (void**) &old_table);
> +            (void **) &old_table, BLOCK_TABLE_L2, s->cluster_size);
>          if (ret < 0) {
>              goto fail;
>          }
> 
>          memcpy(l2_table, old_table, s->cluster_size);
> 
> -        ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
> +        ret = block_cache_put(bs, s->l2_table_cache,
> +            (void **) &old_table, BLOCK_TABLE_L2);
>          if (ret < 0) {
>              goto fail;
>          }
> @@ -222,8 +228,9 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>      BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
> 
>      trace_qcow2_l2_allocate_write_l2(bs, l1_index);
> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> -    ret = qcow2_cache_flush(bs, s->l2_table_cache);
> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +    ret = block_cache_flush(bs, s->l2_table_cache,
> +        BLOCK_TABLE_L2, s->cluster_size);
>      if (ret < 0) {
>          goto fail;
>      }
> @@ -242,7 +249,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
> 
>  fail:
>      trace_qcow2_l2_allocate_done(bs, l1_index, ret);
> -    qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
> +    block_cache_put(bs, s->l2_table_cache, (void **) table, BLOCK_TABLE_L2);
>      s->l1_table[l1_index] = old_l2_offset;
>      return ret;
>  }
> @@ -475,7 +482,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>          abort();
>      }
> 
> -    qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    block_cache_put(bs, s->l2_table_cache, (void **) &l2_table, BLOCK_TABLE_L2);
> 
>      nb_available = (c * s->cluster_sectors);
> 
> @@ -584,13 +591,15 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>       * allocated. */
>      cluster_offset = be64_to_cpu(l2_table[l2_index]);
>      if (cluster_offset & L2E_OFFSET_MASK) {
> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +        block_cache_put(bs, s->l2_table_cache,
> +            (void **) &l2_table, BLOCK_TABLE_L2);
>          return 0;
>      }
> 
>      cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
>      if (cluster_offset < 0) {
> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +        block_cache_put(bs, s->l2_table_cache,
> +            (void **) &l2_table, BLOCK_TABLE_L2);
>          return 0;
>      }
> 
> @@ -605,9 +614,10 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>      /* compressed clusters never have the copied flag */
> 
>      BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>      l2_table[l2_index] = cpu_to_be64(cluster_offset);
> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    ret = block_cache_put(bs, s->l2_table_cache,
> +        (void **) &l2_table, BLOCK_TABLE_L2);
>      if (ret < 0) {
>          return 0;
>      }
> @@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>       * handled.
>       */
>      if (cow) {
> -        qcow2_cache_depends_on_flush(s->l2_table_cache);
> +        block_cache_depends_on_flush(s->l2_table_cache);
>      }
> 
> -    if (qcow2_need_accurate_refcounts(s)) {
> -        qcow2_cache_set_dependency(bs, s->l2_table_cache,
> -                                   s->refcount_block_cache);
> -    }
> +    block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
> +        s->refcount_block_cache, s->cluster_size);
>      ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
>      if (ret < 0) {
>          goto err;
>      }
> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> 
>      for (i = 0; i < m->nb_clusters; i++) {
>          /* if two concurrent writes happen to the same unallocated cluster
> @@ -687,7 +695,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>       }
> 
> 
> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    ret = block_cache_put(bs, s->l2_table_cache,
> +        (void **) &l2_table, BLOCK_TABLE_L2);
>      if (ret < 0) {
>          goto err;
>      }
> @@ -913,7 +922,8 @@ again:
>       * request to complete. If we still had the reference, we could use up the
>       * whole cache with sleeping requests.
>       */
> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    ret = block_cache_put(bs, s->l2_table_cache,
> +        (void **) &l2_table, BLOCK_TABLE_L2);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -1077,14 +1087,15 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
>          }
> 
>          /* First remove L2 entries */
> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>          l2_table[l2_index + i] = cpu_to_be64(0);
> 
>          /* Then decrease the refcount */
>          qcow2_free_any_clusters(bs, old_offset, 1);
>      }
> 
> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    ret = block_cache_put(bs, s->l2_table_cache,
> +        (void **) &l2_table, BLOCK_TABLE_L2);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -1154,7 +1165,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>          old_offset = be64_to_cpu(l2_table[l2_index + i]);
> 
>          /* Update L2 entries */
> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>          if (old_offset & QCOW_OFLAG_COMPRESSED) {
>              l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>              qcow2_free_any_clusters(bs, old_offset, 1);
> @@ -1163,7 +1174,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>          }
>      }
> 
> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +    ret = block_cache_put(bs, s->l2_table_cache,
> +        (void **) &l2_table, BLOCK_TABLE_L2);
>      if (ret < 0) {
>          return ret;
>      }
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 5e3f915..728bfc1 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -25,6 +25,7 @@
>  #include "qemu-common.h"
>  #include "block_int.h"
>  #include "block/qcow2.h"
> +#include "block-cache.h"
> 
>  static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
>  static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
> @@ -71,8 +72,8 @@ static int load_refcount_block(BlockDriverState *bs,
>      int ret;
> 
>      BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
> -    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> -        refcount_block);
> +    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> +        refcount_block, BLOCK_TABLE_REF, s->cluster_size);
> 
>      return ret;
>  }
> @@ -98,8 +99,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>      if (!refcount_block_offset)
>          return 0;
> 
> -    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> -        (void**) &refcount_block);
> +    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> +        (void **) &refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -108,8 +109,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>          ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
>      refcount = be16_to_cpu(refcount_block[block_index]);
> 
> -    ret = qcow2_cache_put(bs, s->refcount_block_cache,
> -        (void**) &refcount_block);
> +    ret = block_cache_put(bs, s->refcount_block_cache,
> +        (void **) &refcount_block, BLOCK_TABLE_REF);
>      if (ret < 0) {
>          return ret;
>      }
> @@ -201,7 +202,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>      *refcount_block = NULL;
> 
>      /* We write to the refcount table, so we might depend on L2 tables */
> -    qcow2_cache_flush(bs, s->l2_table_cache);
> +    block_cache_flush(bs, s->l2_table_cache,
> +        BLOCK_TABLE_L2, s->cluster_size);
> 
>      /* Allocate the refcount block itself and mark it as used */
>      int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
> @@ -217,8 +219,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
> 
>      if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
>          /* Zero the new refcount block before updating it */
> -        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
> -            (void**) refcount_block);
> +        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
> +            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>          if (ret < 0) {
>              goto fail_block;
>          }
> @@ -241,8 +243,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
> 
>          /* Initialize the new refcount block only after updating its refcount,
>           * update_refcount uses the refcount cache itself */
> -        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
> -            (void**) refcount_block);
> +        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
> +            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>          if (ret < 0) {
>              goto fail_block;
>          }
> @@ -252,8 +254,9 @@ static int alloc_refcount_block(BlockDriverState *bs,
> 
>      /* Now the new refcount block needs to be written to disk */
>      BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
> -    qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> +    block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
> +    ret = block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);
>      if (ret < 0) {
>          goto fail_block;
>      }
> @@ -273,7 +276,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>          return 0;
>      }
> 
> -    ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
> +    ret = block_cache_put(bs, s->refcount_block_cache,
> +        (void **) refcount_block, BLOCK_TABLE_REF);
>      if (ret < 0) {
>          goto fail_block;
>      }
> @@ -406,7 +410,8 @@ fail_table:
>      g_free(new_table);
>  fail_block:
>      if (*refcount_block != NULL) {
> -        qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
> +        block_cache_put(bs, s->refcount_block_cache,
> +            (void **) refcount_block, BLOCK_TABLE_REF);
>      }
>      return ret;
>  }
> @@ -432,8 +437,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>      }
> 
>      if (addend < 0) {
> -        qcow2_cache_set_dependency(bs, s->refcount_block_cache,
> -            s->l2_table_cache);
> +        block_cache_set_dependency(bs, s->refcount_block_cache, BLOCK_TABLE_REF,
> +            s->l2_table_cache, s->cluster_size);
>      }
> 
>      start = offset & ~(s->cluster_size - 1);
> @@ -449,8 +454,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>          /* Load the refcount block and allocate it if needed */
>          if (table_index != old_table_index) {
>              if (refcount_block) {
> -                ret = qcow2_cache_put(bs, s->refcount_block_cache,
> -                    (void**) &refcount_block);
> +                ret = block_cache_put(bs, s->refcount_block_cache,
> +                    (void **) &refcount_block, BLOCK_TABLE_REF);
>                  if (ret < 0) {
>                      goto fail;
>                  }
> @@ -463,7 +468,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>          }
>          old_table_index = table_index;
> 
> -        qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
> +        block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
> 
>          /* we can update the count and save it */
>          block_index = cluster_index &
> @@ -486,8 +491,8 @@ fail:
>      /* Write last changed block to disk */
>      if (refcount_block) {
>          int wret;
> -        wret = qcow2_cache_put(bs, s->refcount_block_cache,
> -            (void**) &refcount_block);
> +        wret = block_cache_put(bs, s->refcount_block_cache,
> +            (void **) &refcount_block, BLOCK_TABLE_REF);
>          if (wret < 0) {
>              return ret < 0 ? ret : wret;
>          }
> @@ -763,8 +768,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>              old_l2_offset = l2_offset;
>              l2_offset &= L1E_OFFSET_MASK;
> 
> -            ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
> -                (void**) &l2_table);
> +            ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
> +                (void **) &l2_table, BLOCK_TABLE_L2, s->cluster_size);
>              if (ret < 0) {
>                  goto fail;
>              }
> @@ -811,16 +816,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>                      }
>                      if (offset != old_offset) {
>                          if (addend > 0) {
> -                            qcow2_cache_set_dependency(bs, s->l2_table_cache,
> -                                s->refcount_block_cache);
> +                            block_cache_set_dependency(bs, s->l2_table_cache,
> +                                BLOCK_TABLE_L2, s->refcount_block_cache,
> +                                s->cluster_size);
>                          }
>                          l2_table[j] = cpu_to_be64(offset);
> -                        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> +                        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>                      }
>                  }
>              }
> 
> -            ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +            ret = block_cache_put(bs, s->l2_table_cache,
> +                (void **) &l2_table, BLOCK_TABLE_L2);
>              if (ret < 0) {
>                  goto fail;
>              }
> @@ -847,7 +854,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>      ret = 0;
>  fail:
>      if (l2_table) {
> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> +        block_cache_put(bs, s->l2_table_cache,
> +            (void **) &l2_table, BLOCK_TABLE_L2);
>      }
> 
>      /* Update L1 only if it isn't deleted anyway (addend = -1) */
> diff --git a/block/qcow2.c b/block/qcow2.c
> index fd5e214..b89d312 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -30,6 +30,7 @@
>  #include "qemu-error.h"
>  #include "qerror.h"
>  #include "trace.h"
> +#include "block-cache.h"
> 
>  /*
>    Differences with QCOW:
> @@ -415,8 +416,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>      }
> 
>      /* alloc L2 table/refcount block cache */
> -    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
> -    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
> +    s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
> +    s->refcount_block_cache =
> +        block_cache_create(bs, REFCOUNT_CACHE_SIZE, s->cluster_size);
> 
>      s->cluster_cache = g_malloc(s->cluster_size);
>      /* one more sector for decompressed data alignment */
> @@ -500,7 +502,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>      qcow2_refcount_close(bs);
>      g_free(s->l1_table);
>      if (s->l2_table_cache) {
> -        qcow2_cache_destroy(bs, s->l2_table_cache);
> +        block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
>      }
>      g_free(s->cluster_cache);
>      qemu_vfree(s->cluster_data);
> @@ -860,13 +862,13 @@ static void qcow2_close(BlockDriverState *bs)
>      BDRVQcowState *s = bs->opaque;
>      g_free(s->l1_table);
> 
> -    qcow2_cache_flush(bs, s->l2_table_cache);
> -    qcow2_cache_flush(bs, s->refcount_block_cache);
> -
> +    block_cache_flush(bs, s->l2_table_cache,
> +        BLOCK_TABLE_L2, s->cluster_size);
> +    block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);
>      qcow2_mark_clean(bs);
> -
> -    qcow2_cache_destroy(bs, s->l2_table_cache);
> -    qcow2_cache_destroy(bs, s->refcount_block_cache);
> +    block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
> +    block_cache_destroy(bs, s->refcount_block_cache, BLOCK_TABLE_REF);
> 
>      g_free(s->unknown_header_fields);
>      cleanup_unknown_header_ext(bs);
> @@ -1339,8 +1341,6 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
>                      options->value.s);
>                  return -EINVAL;
>              }
> -        } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
> -            flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
>          }
>          options++;
>      }
> @@ -1537,18 +1537,18 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
>      int ret;
> 
>      qemu_co_mutex_lock(&s->lock);
> -    ret = qcow2_cache_flush(bs, s->l2_table_cache);
> +    ret = block_cache_flush(bs, s->l2_table_cache,
> +        BLOCK_TABLE_L2, s->cluster_size);
>      if (ret < 0) {
>          qemu_co_mutex_unlock(&s->lock);
>          return ret;
>      }
> 
> -    if (qcow2_need_accurate_refcounts(s)) {
> -        ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> -        if (ret < 0) {
> -            qemu_co_mutex_unlock(&s->lock);
> -            return ret;
> -        }
> +    ret = block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);
> +    if (ret < 0) {
> +        qemu_co_mutex_unlock(&s->lock);
> +        return ret;
>      }
>      qemu_co_mutex_unlock(&s->lock);
> 
> diff --git a/block/qcow2.h b/block/qcow2.h
> index b4eb654..cb6fd7a 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -27,6 +27,7 @@
> 
>  #include "aes.h"
>  #include "qemu-coroutine.h"
> +#include "block-cache.h"
> 
>  //#define DEBUG_ALLOC
>  //#define DEBUG_ALLOC2
> @@ -94,8 +95,6 @@ typedef struct QCowSnapshot {
>      uint64_t vm_clock_nsec;
>  } QCowSnapshot;
> 
> -struct Qcow2Cache;
> -typedef struct Qcow2Cache Qcow2Cache;
> 
>  typedef struct Qcow2UnknownHeaderExtension {
>      uint32_t magic;
> @@ -146,8 +145,8 @@ typedef struct BDRVQcowState {
>      uint64_t l1_table_offset;
>      uint64_t *l1_table;
> 
> -    Qcow2Cache* l2_table_cache;
> -    Qcow2Cache* refcount_block_cache;
> +    BlockCache *l2_table_cache;
> +    BlockCache *refcount_block_cache;
> 
>      uint8_t *cluster_cache;
>      uint8_t *cluster_data;
> @@ -316,21 +315,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
> 
>  void qcow2_free_snapshots(BlockDriverState *bs);
>  int qcow2_read_snapshots(BlockDriverState *bs);
> -
> -/* qcow2-cache.c functions */
> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
> -
> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
> -    Qcow2Cache *dependency);
> -void qcow2_cache_depends_on_flush(Qcow2Cache *c);
> -
> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> -    void **table);
> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> -    void **table);
> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
> -
>  #endif
> diff --git a/trace-events b/trace-events
> index 6b12f83..52b6438 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -439,12 +439,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
>  qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
>  qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
> 
> -qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
> -qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> -qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> -qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> -qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
> -qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +# block/block-cache.c
> +block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
> +block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
> +block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> 
>  # block/qed-l2-cache.c
>  qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
@ 2012-09-06 20:19   ` Michael Roth
  2012-09-10  2:25     ` Dong Xu Wang
  2012-09-11  9:40   ` Kevin Wolf
  1 sibling, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 20:19 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: kwolf, qemu-devel

On Fri, Aug 10, 2012 at 11:39:44PM +0800, Dong Xu Wang wrote:
> add-cow file format core code. It use block-cache.c as cache code.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  block/Makefile.objs |    1 +
>  block/add-cow.c     |  613 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/add-cow.h     |   85 +++++++
>  block_int.h         |    2 +
>  4 files changed, 701 insertions(+), 0 deletions(-)
>  create mode 100644 block/add-cow.c
>  create mode 100644 block/add-cow.h
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 23bdfc8..7ed5051 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
> +block-obj-y += add-cow.o
>  block-obj-y += block-cache.o
>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>  block-obj-y += stream.o
> diff --git a/block/add-cow.c b/block/add-cow.c
> new file mode 100644
> index 0000000..d4711d5
> --- /dev/null
> +++ b/block/add-cow.c
> @@ -0,0 +1,613 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "block_int.h"
> +#include "module.h"
> +#include "add-cow.h"
> +
> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
> +{
> +    cpu->magic                      = le64_to_cpu(le->magic);
> +    cpu->version                    = le32_to_cpu(le->version);
> +
> +    cpu->backing_filename_offset    = le32_to_cpu(le->backing_filename_offset);
> +    cpu->backing_filename_size      = le32_to_cpu(le->backing_filename_size);
> +
> +    cpu->image_filename_offset      = le32_to_cpu(le->image_filename_offset);
> +    cpu->image_filename_size        = le32_to_cpu(le->image_filename_size);
> +
> +    cpu->features                   = le64_to_cpu(le->features);
> +    cpu->optional_features          = le64_to_cpu(le->optional_features);
> +    cpu->header_pages_size          = le32_to_cpu(le->header_pages_size);
> +}
> +
> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
> +{
> +    le->magic                       = cpu_to_le64(cpu->magic);
> +    le->version                     = cpu_to_le32(cpu->version);
> +
> +    le->backing_filename_offset     = cpu_to_le32(cpu->backing_filename_offset);
> +    le->backing_filename_size       = cpu_to_le32(cpu->backing_filename_size);
> +
> +    le->image_filename_offset       = cpu_to_le32(cpu->image_filename_offset);
> +    le->image_filename_size         = cpu_to_le32(cpu->image_filename_size);
> +
> +    le->features                    = cpu_to_le64(cpu->features);
> +    le->optional_features           = cpu_to_le64(cpu->optional_features);
> +    le->header_pages_size           = cpu_to_le32(cpu->header_pages_size);
> +}
> +
> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> +    const AddCowHeader *header = (const AddCowHeader *)buf;
> +
> +    if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
> +        le32_to_cpu(header->version) == ADD_COW_VERSION) {
> +        return 100;
> +    } else {
> +        return 0;
> +    }
> +}
> +
> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
> +{
> +    AddCowHeader header = {
> +        .magic = ADD_COW_MAGIC,
> +        .version = ADD_COW_VERSION,
> +        .features = 0,
> +        .optional_features = 0,
> +        .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
> +    };
> +    AddCowHeader le_header;
> +    int64_t image_len = 0;
> +    const char *backing_filename = NULL;
> +    const char *backing_fmt = NULL;
> +    const char *image_filename = NULL;
> +    const char *image_format = NULL;
> +    BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
> +    BlockDriver *drv = bdrv_find_format("add-cow");
> +    BDRVAddCowState s;
> +    int ret;
> +
> +    while (options && options->name) {
> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
> +            image_len = options->value.n;
> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
> +            backing_filename = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
> +            backing_fmt = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
> +            image_filename = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
> +            image_format = options->value.s;
> +        }
> +        options++;
> +    }
> +
> +    if (backing_filename) {
> +        header.backing_filename_offset = sizeof(header)
> +            + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
> +        header.backing_filename_size = strlen(backing_filename);
> +
> +        if (!backing_fmt) {
> +            backing_bs = bdrv_new("image");
> +            ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
> +                    | BDRV_O_CACHE_WB, NULL);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +            backing_fmt = bdrv_get_format_name(backing_bs);
> +            bdrv_delete(backing_bs);
> +        }
> +    } else {
> +        header.features |= ADD_COW_F_All_ALLOCATED;
> +    }
> +
> +    if (image_filename) {
> +        header.image_filename_offset =
> +            sizeof(header) + sizeof(s.backing_file_format)
> +                + sizeof(s.image_file_format) + header.backing_filename_size;
> +        header.image_filename_size = strlen(image_filename);
> +    } else {
> +        error_report("Error: image_file should be given.");
> +        return -EINVAL;
> +    }
> +
> +    if (backing_filename && !strcmp(backing_filename, image_filename)) {
> +        error_report("Error: Trying to create an image with the "
> +                     "same backing file name as the image file name");
> +        return -EINVAL;
> +    }
> +
> +    if (!strcmp(filename, image_filename)) {
> +        error_report("Error: Trying to create an image with the "
> +                     "same filename as the image file name");
> +        return -EINVAL;
> +    }
> +
> +    if (header.image_filename_offset + header.image_filename_size
> +            > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
> +        error_report("image_file name or backing_file name too long.");
> +        return -ENOSPC;
> +    }
> +
> +    ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    bdrv_delete(image_bs);
> +
> +    ret = bdrv_create_file(filename, NULL);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    add_cow_header_cpu_to_le(&header, &le_header);
> +    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
> +        backing_fmt ? strlen(backing_fmt) : 0);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
> +        image_format ? image_format : "raw",
> +        image_format ? strlen(image_format) : sizeof("raw"));
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    if (backing_filename) {
> +        ret = bdrv_pwrite(bs, header.backing_filename_offset,
> +            backing_filename, header.backing_filename_size);
> +        if (ret < 0) {
> +            bdrv_delete(bs);
> +            return ret;
> +        }
> +    }
> +
> +    ret = bdrv_pwrite(bs, header.image_filename_offset,
> +        image_filename, header.image_filename_size);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_truncate(bs, image_len);
> +    bdrv_delete(bs);
> +    return ret;
> +}
> +
> +static int add_cow_open(BlockDriverState *bs, int flags)
> +{
> +    char                image_filename[ADD_COW_FILE_LEN];
> +    char                tmp_name[ADD_COW_FILE_LEN];
> +    BlockDriver         *image_drv = NULL;
> +    int                 ret;
> +    int                 sector_per_byte;
> +    BDRVAddCowState     *s = bs->opaque;
> +    AddCowHeader        le_header;
> +
> +    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
> +    if (ret != sizeof(s->header)) {
> +        goto fail;
> +    }
> +
> +    add_cow_header_le_to_cpu(&le_header, &s->header);
> +
> +    if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    if (s->header.version != ADD_COW_VERSION) {
> +        char version[64];
> +        snprintf(version, sizeof(version), "ADD-COW version %d",
> +            s->header.version);
> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +            bs->device_name, "add-cow", version);
> +        ret = -ENOTSUP;
> +        goto fail;
> +    }
> +
> +    if (s->header.features & ~ADD_COW_FEATURE_MASK) {
> +        char buf[64];
> +        snprintf(buf, sizeof(buf), "%" PRIx64,
> +            s->header.features & ~ADD_COW_FEATURE_MASK);
> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +            bs->device_name, "add-cow", buf);
> +        return -ENOTSUP;
> +    }
> +
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        ret = bdrv_read_string(bs->file, sizeof(s->header),
> +            sizeof(s->backing_file_format) - 1, s->backing_file_format,
> +            sizeof(s->backing_file_format));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }
> +
> +    ret = bdrv_read_string(bs->file,
> +            sizeof(s->header) + sizeof(s->image_file_format),
> +            sizeof(s->image_file_format) - 1, s->image_file_format,
> +            sizeof(s->image_file_format));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
> +                          s->header.backing_filename_size, bs->backing_file,
> +                          sizeof(bs->backing_file));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }
> +
> +    ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
> +                      s->header.image_filename_size, tmp_name,
> +                      sizeof(tmp_name));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    s->image_hd = bdrv_new("");
> +    if (path_has_protocol(image_filename)) {
> +        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
> +    } else {
> +        path_combine(image_filename, sizeof(image_filename),
> +                     bs->filename, tmp_name);
> +    }
> +
> +    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
> +    if (ret < 0) {
> +        bdrv_delete(s->image_hd);
> +        goto fail;
> +    }
> +
> +    bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
> +    s->cluster_size = ADD_COW_CLUSTER_SIZE;
> +    sector_per_byte = SECTORS_PER_CLUSTER * 8;
> +    s->bitmap_size =
> +        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
> +    s->bitmap_cache =
> +        block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
> +
> +    qemu_co_mutex_init(&s->lock);
> +    return 0;
> +fail:
> +    if (s->bitmap_cache) {
> +        block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> +    }
> +    return ret;
> +}
> +
> +static void add_cow_close(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> +    bdrv_delete(s->image_hd);
> +}
> +
> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
> +{
> +    BDRVAddCowState *s  = bs->opaque;
> +    BlockCache *c = s->bitmap_cache;
> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> +    uint8_t *table      = NULL;
> +    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> +        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
> +    int ret = block_cache_get(bs, s->bitmap_cache, offset,
> +        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
> +
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
> +        & (1 << (cluster_num % 8));
> +}
> +
> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors, int *num_same)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int changed;
> +
> +    if (nb_sectors == 0) {
> +        *num_same = 0;
> +        return 0;
> +    }
> +
> +    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
> +        *num_same = nb_sectors - 1;
> +        return 1;
> +    }
> +    changed = is_allocated(bs, sector_num);
> +
> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
> +            break;
> +        }
> +    }
> +    return changed;
> +}
> +
> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
> +                  int64_t sector_num, int nb_sectors)
> +{
> +    int n1;
> +    if ((sector_num + nb_sectors) <= bs->total_sectors) {
> +        return nb_sectors;
> +    }
> +    if (sector_num >= bs->total_sectors) {
> +        n1 = 0;
> +    } else {
> +        n1 = bs->total_sectors - sector_num;
> +    }
> +
> +    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
> +        0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
> +
> +    return n1;
> +}
> +
> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
> +    int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s  = bs->opaque;
> +    int cur_nr_sectors;
> +    uint64_t bytes_done = 0;
> +    QEMUIOVector hd_qiov;
> +    int n, n1, ret = 0;
> +
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    qemu_co_mutex_lock(&s->lock);
> +    while (remaining_sectors != 0) {
> +        cur_nr_sectors = remaining_sectors;
> +        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
> +            cur_nr_sectors = n;
> +            qemu_iovec_reset(&hd_qiov);
> +            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
> +            qemu_co_mutex_unlock(&s->lock);
> +            ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
> +            qemu_co_mutex_lock(&s->lock);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        } else {
> +            cur_nr_sectors = n;
> +            if (bs->backing_hd) {
> +                qemu_iovec_reset(&hd_qiov);
> +                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
> +                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
> +                    sector_num, cur_nr_sectors);
> +                if (n1 > 0) {
> +                    qemu_co_mutex_unlock(&s->lock);
> +                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
> +                                        n, &hd_qiov);
> +                    qemu_co_mutex_lock(&s->lock);
> +                    if (ret < 0) {
> +                        goto fail;
> +                    }
> +                }
> +            } else {
> +                qemu_iovec_memset(&hd_qiov, 0, 0,
> +                    BDRV_SECTOR_SIZE * cur_nr_sectors);
> +            }
> +        }
> +        remaining_sectors -= cur_nr_sectors;
> +        sector_num += cur_nr_sectors;
> +        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
> +    }
> +fail:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_iovec_destroy(&hd_qiov);
> +    return ret;
> +}
> +
> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
> +                                     int n_start, int n_end)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    QEMUIOVector qiov;
> +    struct iovec iov;
> +    int n, ret;
> +
> +    n = n_end - n_start;
> +    if (n <= 0) {
> +        return 0;
> +    }
> +
> +    iov.iov_len = n * BDRV_SECTOR_SIZE;
> +    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
> +
> +    qemu_iovec_init_external(&qiov, &iov, 1);
> +
> +    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = 0;
> +out:
> +    qemu_vfree(iov.iov_base);
> +    return ret;
> +}
> +
> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
> +        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    BlockCache *c = s->bitmap_cache;
> +    int ret = 0, i;
> +    QEMUIOVector hd_qiov;
> +    uint8_t *table;
> +    uint64_t offset;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    ret = bdrv_co_writev(s->image_hd,
> +                     sector_num,
> +                     remaining_sectors, qiov);

alignment                   ^

or even at ^ if you prefer and have done in some places, just need to be
consistent about it for better readability.

> +
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        /* Copy content of unmodified sectors */
> +        if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {

Why do we avoid a COW when writing to the first sector of a cluster?

> +            ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
> +                sector_num);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1)
> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
> +                ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        for (i = sector_num / SECTORS_PER_CLUSTER;
> +            i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
> +            i++) {
> +            offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> +                + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
> +            ret = block_cache_get(bs, s->bitmap_cache, offset,
> +                (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
> +                table[i / 8] |= (1 << (i % 8));
> +                block_cache_entry_mark_dirty(s->bitmap_cache, table);
> +            }
> +        }
> +    }
> +    ret = 0;
> +fail:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_iovec_destroy(&hd_qiov);
> +    return ret;
> +}
> +
> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
> +    int ret;
> +    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
> +    int64_t bitmap_size =
> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
> +    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
> +        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
> +
> +    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return 0;
> +}
> +
> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int ret;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
> +        ADD_COW_CACHE_ENTRY_SIZE);
> +    qemu_co_mutex_unlock(&s->lock);
> +    return ret;
> +}
> +
> +static QEMUOptionParameter add_cow_create_options[] = {
> +    {
> +        .name = BLOCK_OPT_SIZE,
> +        .type = OPT_SIZE,
> +        .help = "Virtual disk size"
> +    },
> +    {
> +        .name = BLOCK_OPT_BACKING_FILE,
> +        .type = OPT_STRING,
> +        .help = "File name of a base image"
> +    },
> +    {
> +        .name = BLOCK_OPT_BACKING_FMT,
> +        .type = OPT_STRING,
> +        .help = "Image format of the base image"
> +    },
> +    {
> +        .name = BLOCK_OPT_IMAGE_FILE,
> +        .type = OPT_STRING,
> +        .help = "File name of a image file"
> +    },
> +    {
> +        .name = BLOCK_OPT_IMAGE_FORMAT,
> +        .type = OPT_STRING,
> +        .help = "Image format of the image file"
> +    },
> +    { NULL }
> +};
> +
> +static BlockDriver bdrv_add_cow = {
> +    .format_name                = "add-cow",
> +    .instance_size              = sizeof(BDRVAddCowState),
> +    .bdrv_probe                 = add_cow_probe,
> +    .bdrv_open                  = add_cow_open,
> +    .bdrv_close                 = add_cow_close,
> +    .bdrv_create                = add_cow_create,
> +    .bdrv_co_readv              = add_cow_co_readv,
> +    .bdrv_co_writev             = add_cow_co_writev,
> +    .bdrv_truncate              = bdrv_add_cow_truncate,
> +    .bdrv_co_is_allocated       = add_cow_is_allocated,
> +
> +    .create_options             = add_cow_create_options,
> +    .bdrv_co_flush_to_os        = add_cow_co_flush,
> +};
> +
> +static void bdrv_add_cow_init(void)
> +{
> +    bdrv_register(&bdrv_add_cow);
> +}
> +
> +block_init(bdrv_add_cow_init);
> diff --git a/block/add-cow.h b/block/add-cow.h
> new file mode 100644
> index 0000000..f058376
> --- /dev/null
> +++ b/block/add-cow.h
> @@ -0,0 +1,85 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#ifndef BLOCK_ADD_COW_H
> +#define BLOCK_ADD_COW_H
> +#include "block-cache.h"
> +
> +enum {
> +    ADD_COW_F_All_ALLOCATED     = 0X01,

Please use "ADD_COW_F_ALL_ALLOCATED" (all caps)

was searching your patch for how this was used and was scratching my
head when I wasn't seeing any matches :)

> +    ADD_COW_FEATURE_MASK        = ADD_COW_F_All_ALLOCATED,
> +
> +    ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
> +                    ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
> +                    ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
> +                    ((uint64_t)'W' << 8) | 0xFF),
> +    ADD_COW_VERSION             = 1,
> +    ADD_COW_FILE_LEN            = 1024,
> +    ADD_COW_CACHE_SIZE          = 16,
> +    ADD_COW_CACHE_ENTRY_SIZE    = 65536,
> +    ADD_COW_CLUSTER_SIZE        = 65536,
> +    SECTORS_PER_CLUSTER         = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
> +    ADD_COW_PAGE_SIZE           = 4096,
> +    ADD_COW_DEFAULT_PAGE_SIZE   = 1,
> +};
> +
> +typedef struct AddCowHeader {
> +    uint64_t        magic;
> +    uint32_t        version;
> +
> +    uint32_t        backing_filename_offset;
> +    uint32_t        backing_filename_size;
> +
> +    uint32_t        image_filename_offset;
> +    uint32_t        image_filename_size;
> +
> +    uint64_t        features;
> +    uint64_t        optional_features;
> +    uint32_t        header_pages_size;
> +} QEMU_PACKED AddCowHeader;

You should avoid using packed structures for image format headers.
Instead, I would either:

a) re-order the fields so that 32/64-bit fields, respectively, fall on
32/64-bit boundaries (in your case, for instance, moving header_pages_size
above features) like qed/qcow2 do, or

b) read/write the fields individually rather than reading/writing directly
into/from the header struct.

The safest route is b). Adds a few lines of code, but you won't have to
re-work things (or worry about introducing bugs) later if you were to add,
say, a 32-bit value, and then a 64-bit value later.

> +
> +typedef struct BDRVAddCowState {
> +    BlockDriverState    *image_hd;
> +    CoMutex             lock;
> +    int                 cluster_size;
> +    BlockCache         *bitmap_cache;
> +    uint64_t            bitmap_size;
> +    AddCowHeader        header;
> +    char                backing_file_format[16];
> +    char                image_file_format[16];
> +} BDRVAddCowState;
> +
> +/* Convert sector_num to offset in bitmap */
> +static inline int64_t offset_in_bitmap(int64_t sector_num)
> +{
> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> +    return cluster_num / 8;
> +}
> +
> +static inline bool is_cluster_head(int64_t sector_num)
> +{
> +    return sector_num % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +static inline bool is_cluster_tail(int64_t sector_num)
> +{
> +    return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
> +    void **table);
> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
> +#endif
> diff --git a/block_int.h b/block_int.h
> index 6c1d9ca..67954ec 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -53,6 +53,8 @@
>  #define BLOCK_OPT_SUBFMT            "subformat"
>  #define BLOCK_OPT_COMPAT_LEVEL      "compat"
>  #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
> +#define BLOCK_OPT_IMAGE_FILE        "image_file"
> +#define BLOCK_OPT_IMAGE_FORMAT      "image_format"
> 
>  typedef struct BdrvTrackedRequest BdrvTrackedRequest;
> 
> -- 
> 1.7.1
> 
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
  2012-09-06 17:27   ` Michael Roth
@ 2012-09-10  1:48     ` Dong Xu Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10  1:48 UTC (permalink / raw)
  To: Michael Roth; +Cc: kwolf, qemu-devel

On Fri, Sep 7, 2012 at 1:27 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:40PM +0800, Dong Xu Wang wrote:
>> Document for add-cow format, the usage and spec of add-cow are introduced.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  docs/specs/add-cow.txt |  123 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 123 insertions(+), 0 deletions(-)
>>  create mode 100644 docs/specs/add-cow.txt
>>
>> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
>> new file mode 100644
>> index 0000000..d5a7a68
>> --- /dev/null
>> +++ b/docs/specs/add-cow.txt
>> @@ -0,0 +1,123 @@
>> +== General ==
>> +
>> +The raw file format does not support backing files or copy on write feature.
>> +The add-cow image format makes it possible to use backing files with raw
>> +image by keeping a separate .add-cow metadata file. Once all sectors
>> +have been written into the raw image it is safe to discard the .add-cow
>> +and backing files, then we can use the raw image directly.
>> +
>> +An example usage of add-cow would look like::
>> +(ubuntu.img is a disk image which has been installed OS.)
>> +    1)  Create a raw image with the same size of ubuntu.img
>> +            qemu-img create -f raw test.raw 8G
>> +    2)  Create an add-cow image which will store dirty bitmap
>> +            qemu-img create -f add-cow test.add-cow \
>> +                -o backing_file=ubuntu.img,image_file=test.raw
>> +    3)  Run qemu with add-cow image
>> +            qemu -drive if=virtio,file=test.add-cow
>> +
>> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
>> +will be calculated from the size of test.raw.
>> +
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +---------------+-------------+-----------------+
>> + |     Header    |   Reserved  |    COW bitmap   |
>> + +---------------+-------------+-----------------+
>> +
>> +All numbers in add-cow are stored in Little Endian byte order.
>> +
>> +== Header ==
>> +
>> +The Header is included in the first bytes:
>> +(#define HEADER_SIZE (4096 * header_pages_size))
>> +    Byte    0 -  7:     magic
>> +                        add-cow magic string ("ADD_COW\xff").
>> +
>> +            8 -  11:    version
>> +                        Version number (only valid value is 1 now).
>> +
>> +            12 - 15:    backing file name offset
>> +                        Offset in the add-cow file at which the backing file
>> +                        name is stored (NB: The string is not nul-terminated).
>> +                        If backing file name does NOT exist, this field will be
>> +                        0. Must be between 80 and [HEADER_SIZE - 2](a file name
>> +                        must be at least 1 byte).
>> +
>> +            16 - 19:    backing file name size
>> +                        Length of the backing file name in bytes. It will be 0
>> +                        if the backing file name offset is 0. If backing file
>> +                        name offset is non-zero, then it must be non-zero. Must
>> +                        be less than [HEADER_SIZE - 80] to fit in the reserved
>> +                        part of the header.
>> +
>> +            20 - 23:    image file name offset
>> +                        Offset in the add-cow file at which the image file name
>> +                        is stored (NB: The string is not null terminated). It
>> +                        must be between 80 and [HEADER_SIZE - 2].
>> +
>> +            24 - 27:    image file name size
>> +                        Length of the image file name in bytes.
>> +                        Must be less than [HEADER_SIZE - 80] to fit in the reserved
>> +                        part of the header.
>> +
>> +            28 - 35:    features
>> +                        Currently only 1 feature bit is used:
>> +                        Feature bits:
>> +                            * ADD_COW_F_All_ALLOCATED   = 0x01.
>> +
>> +            36 - 43:    optional features
>> +                        Not used now. Reserved for future use. It must be set to 0.
>> +
>> +            44 - 47:    header pages size
>> +                        The header field is variable-sized. This field indicates
>> +                        how many pages(4k) will be used to store add-cow header.
>> +                        In add-cow v1, it is fixed to 1, so the header size will
>> +                        be 4k * 1 = 4096 bytes.
>> +
>> +            48 - 63:    backing file format
>> +                        format of backing file. It will be filled with 0 if
>> +                        backing file name offset is 0. If backing file name
>> +                        offset is non-zero, it must be non-zero. It is coded
>> +                        in free-form ASCII, and is not NUL-terminated.
>> +
>> +            64 - 79:    image file format
>> +                        format of image file. It must be non-zero. It is coded
>> +                        in free-form ASCII, and is not NUL-terminated.
>> +
>> +            80 - [HEADER_SIZE - 1]:
>> +                        It is used to make sure COW bitmap field starts at the
>> +                        HEADER_SIZE byte, backing file name and image file name
>> +                        will be stored here. The bytes that is not pointing to
>> +                        backing file and image file names will bet set to 0.
>> +
>> +== COW bitmap ==
>> +
>> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
>> +backing file and image file. The bitmap will track whether the sector in
>> +backing file is dirty or not.
>> +
>> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
>> +sectors, then each bit indicates 512 * 128 = 64k bytes. the size of bitmap is
>> +calculated according to virtual size of image file, and it also should be multipe
>> +of 65536, the bits not used will be set to 0. Within each byte, the least
>> +significant bit covers the first cluster. Bit orders in one byte look like:
>> + +----+----+----+----+----+----+----+----+
>> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
>> + +----+----+----+----+----+----+----+----+
>> +
>> +If the bit is 0, indicates the sector has not been allocated in image file, data
>> +should be loaded from backing file while reading; if the bit is 1, indicates the
>> +related sector has been dirty, should be loaded from image file while reading.
>> +Writing to a sector causes the corresponding bit to be set to 1.
>> +
>> +If raw image is not an even multiple of cluster bytes, bits that correspond to
>> +bytes beyond the raw file size in add-cow will be 0.
>> +
>> +Image file name and backing file name must NOT be the same, we prevent this
>> +while creating add-cow files.
>> +
>> +Image file and backing file are interpreted relative to the qcow2 file, not
>
> Relative to the add-cow file?
Ah, yes..
>
>> +to the current working directory of the process that opened the qcow2 file.

>> --
>> 1.7.1
>>
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string
  2012-09-06 17:32   ` Michael Roth
@ 2012-09-10  1:49     ` Dong Xu Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10  1:49 UTC (permalink / raw)
  To: Michael Roth; +Cc: kwolf, qemu-devel

On Fri, Sep 7, 2012 at 1:32 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:42PM +0800, Dong Xu Wang wrote:
>> Make qed_read_string function to a common interface, so move it to block.c.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  block.c     |   27 +++++++++++++++++++++++++++
>>  block.h     |    2 ++
>>  block/qed.c |   29 +----------------------------
>>  3 files changed, 30 insertions(+), 28 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index c13d803..d906b35 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -213,6 +213,33 @@ int path_has_protocol(const char *path)
>>      return *p == ':';
>>  }
>>
>> +/**
>> + * Read a string of known length from the image file
>> + *
>> + * @bs:         Image file
>> + * @offset:     File offset to start of string, in bytes
>> + * @n:          String length in bytes
>> + * @buf:        Destination buffer
>> + * @buflen:     Destination buffer length in bytes
>> + * @ret:        0 on success, -errno on failure
>> + *
>> + * The string is NUL-terminated.
>> + */
>> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
>> +                           char *buf, size_t buflen)
>
> Small alignment issue   ^
>
>> +{
>> +    int ret;
>> +    if (n >= buflen) {
>> +        return -EINVAL;
>> +    }
>> +    ret = bdrv_pread(bs, offset, buf, n);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    buf[n] = '\0';
>> +    return 0;
>> +}
>> +
>>  int path_is_absolute(const char *path)
>>  {
>>  #ifdef _WIN32
>> diff --git a/block.h b/block.h
>> index 54e61c9..e5dfcd7 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -154,6 +154,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
>>      const void *buf, int count);
>>  int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
>>      int nb_sectors, QEMUIOVector *qiov);
>> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
>> +    char *buf, size_t buflen);
>
> Another one here        ^
>
>>  int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
>>      int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
>>  int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
>> diff --git a/block/qed.c b/block/qed.c
>> index 5f3eefa..311c589 100644
>> --- a/block/qed.c
>> +++ b/block/qed.c
>> @@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
>>  }
>>
>>  /**
>> - * Read a string of known length from the image file
>> - *
>> - * @file:       Image file
>> - * @offset:     File offset to start of string, in bytes
>> - * @n:          String length in bytes
>> - * @buf:        Destination buffer
>> - * @buflen:     Destination buffer length in bytes
>> - * @ret:        0 on success, -errno on failure
>> - *
>> - * The string is NUL-terminated.
>> - */
>> -static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
>> -                           char *buf, size_t buflen)
>> -{
>> -    int ret;
>> -    if (n >= buflen) {
>> -        return -EINVAL;
>> -    }
>> -    ret = bdrv_pread(file, offset, buf, n);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -    buf[n] = '\0';
>> -    return 0;
>> -}
>> -
>> -/**
>>   * Allocate new clusters
>>   *
>>   * @s:          QED state
>> @@ -437,7 +410,7 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>>              return -EINVAL;
>>          }
>>
>> -        ret = qed_read_string(bs->file, s->header.backing_filename_offset,
>> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>>                                s->header.backing_filename_size, bs->backing_file,
>>                                sizeof(bs->backing_file));
>
> Here too                          ^
>
> Looks good otherwise.
>
>>          if (ret < 0) {
>> --
>> 1.7.1
>>
>>
>
Thank you Michael .

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
  2012-09-06 17:52   ` Michael Roth
@ 2012-09-10  2:14     ` Dong Xu Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10  2:14 UTC (permalink / raw)
  To: Michael Roth; +Cc: kwolf, qemu-devel

On Fri, Sep 7, 2012 at 1:52 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:43PM +0800, Dong Xu Wang wrote:
>> add-cow and qcow2 file format will share the same cache code, so rename
>> block-cache.c to block-cache.c. And related structure and qcow2 code also
>
> "qcow2-cache.c to block-cache.c"
>
> But I've scanned through the rest of your patches and can't seem to find
> where block-cache.c gets introduced. Did you forget to git add it?

Really sorry for that, I forget to add the block-cache.c, will add it in v13.
>
>> are changed.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  block.h                |    3 +
>>  block/Makefile.objs    |    3 +-
>>  block/qcow2-cache.c    |  323 ------------------------------------------------
>>  block/qcow2-cluster.c  |   66 ++++++----
>>  block/qcow2-refcount.c |   66 ++++++-----
>>  block/qcow2.c          |   36 +++---
>>  block/qcow2.h          |   24 +---
>>  trace-events           |   13 +-
>>  8 files changed, 109 insertions(+), 425 deletions(-)
>>  delete mode 100644 block/qcow2-cache.c
>>
>> diff --git a/block.h b/block.h
>> index e5dfcd7..c325661 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -401,6 +401,9 @@ typedef enum {
>>      BLKDBG_CLUSTER_ALLOC_BYTES,
>>      BLKDBG_CLUSTER_FREE,
>>
>> +    BLKDBG_ADD_COW_UPDATE,
>> +    BLKDBG_ADD_COW_LOAD,
>> +
>>      BLKDBG_EVENT_MAX,
>>  } BlkDebugEvent;
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index b5754d3..23bdfc8 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -1,7 +1,8 @@
>>  block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
>> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>  block-obj-y += qed-check.o
>> +block-obj-y += block-cache.o
>>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>  block-obj-y += stream.o
>>  block-obj-$(CONFIG_WIN32) += raw-win32.o
>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
>> deleted file mode 100644
>> index 2d4322a..0000000
>> --- a/block/qcow2-cache.c
>> +++ /dev/null
>> @@ -1,323 +0,0 @@
>> -/*
>> - * L2/refcount table cache for the QCOW2 format
>> - *
>> - * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
>> - *
>> - * Permission is hereby granted, free of charge, to any person obtaining a copy
>> - * of this software and associated documentation files (the "Software"), to deal
>> - * in the Software without restriction, including without limitation the rights
>> - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> - * copies of the Software, and to permit persons to whom the Software is
>> - * furnished to do so, subject to the following conditions:
>> - *
>> - * The above copyright notice and this permission notice shall be included in
>> - * all copies or substantial portions of the Software.
>> - *
>> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> - * THE SOFTWARE.
>> - */
>> -
>> -#include "block_int.h"
>> -#include "qemu-common.h"
>> -#include "qcow2.h"
>> -#include "trace.h"
>> -
>> -typedef struct Qcow2CachedTable {
>> -    void*   table;
>> -    int64_t offset;
>> -    bool    dirty;
>> -    int     cache_hits;
>> -    int     ref;
>> -} Qcow2CachedTable;
>> -
>> -struct Qcow2Cache {
>> -    Qcow2CachedTable*       entries;
>> -    struct Qcow2Cache*      depends;
>> -    int                     size;
>> -    bool                    depends_on_flush;
>> -};
>> -
>> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>> -{
>> -    BDRVQcowState *s = bs->opaque;
>> -    Qcow2Cache *c;
>> -    int i;
>> -
>> -    c = g_malloc0(sizeof(*c));
>> -    c->size = num_tables;
>> -    c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
>> -    }
>> -
>> -    return c;
>> -}
>> -
>> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
>> -{
>> -    int i;
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        assert(c->entries[i].ref == 0);
>> -        qemu_vfree(c->entries[i].table);
>> -    }
>> -
>> -    g_free(c->entries);
>> -    g_free(c);
>> -
>> -    return 0;
>> -}
>> -
>> -static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
>> -{
>> -    int ret;
>> -
>> -    ret = qcow2_cache_flush(bs, c->depends);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    c->depends = NULL;
>> -    c->depends_on_flush = false;
>> -
>> -    return 0;
>> -}
>> -
>> -static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
>> -{
>> -    BDRVQcowState *s = bs->opaque;
>> -    int ret = 0;
>> -
>> -    if (!c->entries[i].dirty || !c->entries[i].offset) {
>> -        return 0;
>> -    }
>> -
>> -    trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
>> -                                  c == s->l2_table_cache, i);
>> -
>> -    if (c->depends) {
>> -        ret = qcow2_cache_flush_dependency(bs, c);
>> -    } else if (c->depends_on_flush) {
>> -        ret = bdrv_flush(bs->file);
>> -        if (ret >= 0) {
>> -            c->depends_on_flush = false;
>> -        }
>> -    }
>> -
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    if (c == s->refcount_block_cache) {
>> -        BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
>> -    } else if (c == s->l2_table_cache) {
>> -        BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
>> -    }
>> -
>> -    ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
>> -        s->cluster_size);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    c->entries[i].dirty = false;
>> -
>> -    return 0;
>> -}
>> -
>> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
>> -{
>> -    BDRVQcowState *s = bs->opaque;
>> -    int result = 0;
>> -    int ret;
>> -    int i;
>> -
>> -    trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        ret = qcow2_cache_entry_flush(bs, c, i);
>> -        if (ret < 0 && result != -ENOSPC) {
>> -            result = ret;
>> -        }
>> -    }
>> -
>> -    if (result == 0) {
>> -        ret = bdrv_flush(bs->file);
>> -        if (ret < 0) {
>> -            result = ret;
>> -        }
>> -    }
>> -
>> -    return result;
>> -}
>> -
>> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
>> -    Qcow2Cache *dependency)
>> -{
>> -    int ret;
>> -
>> -    if (dependency->depends) {
>> -        ret = qcow2_cache_flush_dependency(bs, dependency);
>> -        if (ret < 0) {
>> -            return ret;
>> -        }
>> -    }
>> -
>> -    if (c->depends && (c->depends != dependency)) {
>> -        ret = qcow2_cache_flush_dependency(bs, c);
>> -        if (ret < 0) {
>> -            return ret;
>> -        }
>> -    }
>> -
>> -    c->depends = dependency;
>> -    return 0;
>> -}
>> -
>> -void qcow2_cache_depends_on_flush(Qcow2Cache *c)
>> -{
>> -    c->depends_on_flush = true;
>> -}
>> -
>> -static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
>> -{
>> -    int i;
>> -    int min_count = INT_MAX;
>> -    int min_index = -1;
>> -
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        if (c->entries[i].ref) {
>> -            continue;
>> -        }
>> -
>> -        if (c->entries[i].cache_hits < min_count) {
>> -            min_index = i;
>> -            min_count = c->entries[i].cache_hits;
>> -        }
>> -
>> -        /* Give newer hits priority */
>> -        /* TODO Check how to optimize the replacement strategy */
>> -        c->entries[i].cache_hits /= 2;
>> -    }
>> -
>> -    if (min_index == -1) {
>> -        /* This can't happen in current synchronous code, but leave the check
>> -         * here as a reminder for whoever starts using AIO with the cache */
>> -        abort();
>> -    }
>> -    return min_index;
>> -}
>> -
>> -static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
>> -    uint64_t offset, void **table, bool read_from_disk)
>> -{
>> -    BDRVQcowState *s = bs->opaque;
>> -    int i;
>> -    int ret;
>> -
>> -    trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
>> -                          offset, read_from_disk);
>> -
>> -    /* Check if the table is already cached */
>> -    for (i = 0; i < c->size; i++) {
>> -        if (c->entries[i].offset == offset) {
>> -            goto found;
>> -        }
>> -    }
>> -
>> -    /* If not, write a table back and replace it */
>> -    i = qcow2_cache_find_entry_to_replace(c);
>> -    trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
>> -                                        c == s->l2_table_cache, i);
>> -    if (i < 0) {
>> -        return i;
>> -    }
>> -
>> -    ret = qcow2_cache_entry_flush(bs, c, i);
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    trace_qcow2_cache_get_read(qemu_coroutine_self(),
>> -                               c == s->l2_table_cache, i);
>> -    c->entries[i].offset = 0;
>> -    if (read_from_disk) {
>> -        if (c == s->l2_table_cache) {
>> -            BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>> -        }
>> -
>> -        ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
>> -        if (ret < 0) {
>> -            return ret;
>> -        }
>> -    }
>> -
>> -    /* Give the table some hits for the start so that it won't be replaced
>> -     * immediately. The number 32 is completely arbitrary. */
>> -    c->entries[i].cache_hits = 32;
>> -    c->entries[i].offset = offset;
>> -
>> -    /* And return the right table */
>> -found:
>> -    c->entries[i].cache_hits++;
>> -    c->entries[i].ref++;
>> -    *table = c->entries[i].table;
>> -
>> -    trace_qcow2_cache_get_done(qemu_coroutine_self(),
>> -                               c == s->l2_table_cache, i);
>> -
>> -    return 0;
>> -}
>> -
>> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> -    void **table)
>> -{
>> -    return qcow2_cache_do_get(bs, c, offset, table, true);
>> -}
>> -
>> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> -    void **table)
>> -{
>> -    return qcow2_cache_do_get(bs, c, offset, table, false);
>> -}
>> -
>> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
>> -{
>> -    int i;
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        if (c->entries[i].table == *table) {
>> -            goto found;
>> -        }
>> -    }
>> -    return -ENOENT;
>> -
>> -found:
>> -    c->entries[i].ref--;
>> -    *table = NULL;
>> -
>> -    assert(c->entries[i].ref >= 0);
>> -    return 0;
>> -}
>> -
>> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
>> -{
>> -    int i;
>> -
>> -    for (i = 0; i < c->size; i++) {
>> -        if (c->entries[i].table == table) {
>> -            goto found;
>> -        }
>> -    }
>> -    abort();
>> -
>> -found:
>> -    c->entries[i].dirty = true;
>> -}
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index e179211..335dc7a 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -28,6 +28,7 @@
>>  #include "block_int.h"
>>  #include "block/qcow2.h"
>>  #include "trace.h"
>> +#include "block-cache.h"
>>
>>  int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>>  {
>> @@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>>          return new_l1_table_offset;
>>      }
>>
>> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> +    ret = block_cache_flush(bs, s->refcount_block_cache,
>> +        BLOCK_TABLE_REF, s->cluster_size);
>>      if (ret < 0) {
>>          goto fail;
>>      }
>> @@ -119,7 +121,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
>>      BDRVQcowState *s = bs->opaque;
>>      int ret;
>>
>> -    ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
>> +    ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
>> +        (void **) l2_table, BLOCK_TABLE_L2, s->cluster_size);
>>
>>      return ret;
>>  }
>> @@ -180,7 +183,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>          return l2_offset;
>>      }
>>
>> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> +    ret = block_cache_flush(bs, s->refcount_block_cache,
>> +        BLOCK_TABLE_REF, s->cluster_size);
>>      if (ret < 0) {
>>          goto fail;
>>      }
>> @@ -188,7 +192,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>      /* allocate a new entry in the l2 cache */
>>
>>      trace_qcow2_l2_allocate_get_empty(bs, l1_index);
>> -    ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
>> +    ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
>> +        (void **) table, BLOCK_TABLE_L2, s->cluster_size);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> @@ -203,16 +208,17 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>
>>          /* if there was an old l2 table, read it from the disk */
>>          BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
>> -        ret = qcow2_cache_get(bs, s->l2_table_cache,
>> +        ret = block_cache_get(bs, s->l2_table_cache,
>>              old_l2_offset & L1E_OFFSET_MASK,
>> -            (void**) &old_table);
>> +            (void **) &old_table, BLOCK_TABLE_L2, s->cluster_size);
>>          if (ret < 0) {
>>              goto fail;
>>          }
>>
>>          memcpy(l2_table, old_table, s->cluster_size);
>>
>> -        ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
>> +        ret = block_cache_put(bs, s->l2_table_cache,
>> +            (void **) &old_table, BLOCK_TABLE_L2);
>>          if (ret < 0) {
>>              goto fail;
>>          }
>> @@ -222,8 +228,9 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>      BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
>>
>>      trace_qcow2_l2_allocate_write_l2(bs, l1_index);
>> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> -    ret = qcow2_cache_flush(bs, s->l2_table_cache);
>> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +    ret = block_cache_flush(bs, s->l2_table_cache,
>> +        BLOCK_TABLE_L2, s->cluster_size);
>>      if (ret < 0) {
>>          goto fail;
>>      }
>> @@ -242,7 +249,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>
>>  fail:
>>      trace_qcow2_l2_allocate_done(bs, l1_index, ret);
>> -    qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
>> +    block_cache_put(bs, s->l2_table_cache, (void **) table, BLOCK_TABLE_L2);
>>      s->l1_table[l1_index] = old_l2_offset;
>>      return ret;
>>  }
>> @@ -475,7 +482,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>>          abort();
>>      }
>>
>> -    qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    block_cache_put(bs, s->l2_table_cache, (void **) &l2_table, BLOCK_TABLE_L2);
>>
>>      nb_available = (c * s->cluster_sectors);
>>
>> @@ -584,13 +591,15 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>>       * allocated. */
>>      cluster_offset = be64_to_cpu(l2_table[l2_index]);
>>      if (cluster_offset & L2E_OFFSET_MASK) {
>> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +        block_cache_put(bs, s->l2_table_cache,
>> +            (void **) &l2_table, BLOCK_TABLE_L2);
>>          return 0;
>>      }
>>
>>      cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
>>      if (cluster_offset < 0) {
>> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +        block_cache_put(bs, s->l2_table_cache,
>> +            (void **) &l2_table, BLOCK_TABLE_L2);
>>          return 0;
>>      }
>>
>> @@ -605,9 +614,10 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>>      /* compressed clusters never have the copied flag */
>>
>>      BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
>> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>      l2_table[l2_index] = cpu_to_be64(cluster_offset);
>> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    ret = block_cache_put(bs, s->l2_table_cache,
>> +        (void **) &l2_table, BLOCK_TABLE_L2);
>>      if (ret < 0) {
>>          return 0;
>>      }
>> @@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>>       * handled.
>>       */
>>      if (cow) {
>> -        qcow2_cache_depends_on_flush(s->l2_table_cache);
>> +        block_cache_depends_on_flush(s->l2_table_cache);
>>      }
>>
>> -    if (qcow2_need_accurate_refcounts(s)) {
>> -        qcow2_cache_set_dependency(bs, s->l2_table_cache,
>> -                                   s->refcount_block_cache);
>> -    }
>> +    block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
>> +        s->refcount_block_cache, s->cluster_size);
>>      ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
>>      if (ret < 0) {
>>          goto err;
>>      }
>> -    qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +    block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>
>>      for (i = 0; i < m->nb_clusters; i++) {
>>          /* if two concurrent writes happen to the same unallocated cluster
>> @@ -687,7 +695,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>>       }
>>
>>
>> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    ret = block_cache_put(bs, s->l2_table_cache,
>> +        (void **) &l2_table, BLOCK_TABLE_L2);
>>      if (ret < 0) {
>>          goto err;
>>      }
>> @@ -913,7 +922,8 @@ again:
>>       * request to complete. If we still had the reference, we could use up the
>>       * whole cache with sleeping requests.
>>       */
>> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    ret = block_cache_put(bs, s->l2_table_cache,
>> +        (void **) &l2_table, BLOCK_TABLE_L2);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> @@ -1077,14 +1087,15 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
>>          }
>>
>>          /* First remove L2 entries */
>> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>          l2_table[l2_index + i] = cpu_to_be64(0);
>>
>>          /* Then decrease the refcount */
>>          qcow2_free_any_clusters(bs, old_offset, 1);
>>      }
>>
>> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    ret = block_cache_put(bs, s->l2_table_cache,
>> +        (void **) &l2_table, BLOCK_TABLE_L2);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> @@ -1154,7 +1165,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>>          old_offset = be64_to_cpu(l2_table[l2_index + i]);
>>
>>          /* Update L2 entries */
>> -        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>          if (old_offset & QCOW_OFLAG_COMPRESSED) {
>>              l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>>              qcow2_free_any_clusters(bs, old_offset, 1);
>> @@ -1163,7 +1174,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>>          }
>>      }
>>
>> -    ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +    ret = block_cache_put(bs, s->l2_table_cache,
>> +        (void **) &l2_table, BLOCK_TABLE_L2);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 5e3f915..728bfc1 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -25,6 +25,7 @@
>>  #include "qemu-common.h"
>>  #include "block_int.h"
>>  #include "block/qcow2.h"
>> +#include "block-cache.h"
>>
>>  static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
>>  static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>> @@ -71,8 +72,8 @@ static int load_refcount_block(BlockDriverState *bs,
>>      int ret;
>>
>>      BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
>> -    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> -        refcount_block);
>> +    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> +        refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>>
>>      return ret;
>>  }
>> @@ -98,8 +99,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>>      if (!refcount_block_offset)
>>          return 0;
>>
>> -    ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> -        (void**) &refcount_block);
>> +    ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> +        (void **) &refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> @@ -108,8 +109,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>>          ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
>>      refcount = be16_to_cpu(refcount_block[block_index]);
>>
>> -    ret = qcow2_cache_put(bs, s->refcount_block_cache,
>> -        (void**) &refcount_block);
>> +    ret = block_cache_put(bs, s->refcount_block_cache,
>> +        (void **) &refcount_block, BLOCK_TABLE_REF);
>>      if (ret < 0) {
>>          return ret;
>>      }
>> @@ -201,7 +202,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>      *refcount_block = NULL;
>>
>>      /* We write to the refcount table, so we might depend on L2 tables */
>> -    qcow2_cache_flush(bs, s->l2_table_cache);
>> +    block_cache_flush(bs, s->l2_table_cache,
>> +        BLOCK_TABLE_L2, s->cluster_size);
>>
>>      /* Allocate the refcount block itself and mark it as used */
>>      int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
>> @@ -217,8 +219,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>
>>      if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
>>          /* Zero the new refcount block before updating it */
>> -        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> -            (void**) refcount_block);
>> +        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> +            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>>          if (ret < 0) {
>>              goto fail_block;
>>          }
>> @@ -241,8 +243,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>
>>          /* Initialize the new refcount block only after updating its refcount,
>>           * update_refcount uses the refcount cache itself */
>> -        ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> -            (void**) refcount_block);
>> +        ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> +            (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>>          if (ret < 0) {
>>              goto fail_block;
>>          }
>> @@ -252,8 +254,9 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>
>>      /* Now the new refcount block needs to be written to disk */
>>      BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
>> -    qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
>> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> +    block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
>> +    ret = block_cache_flush(bs, s->refcount_block_cache,
>> +        BLOCK_TABLE_REF, s->cluster_size);
>>      if (ret < 0) {
>>          goto fail_block;
>>      }
>> @@ -273,7 +276,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>          return 0;
>>      }
>>
>> -    ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
>> +    ret = block_cache_put(bs, s->refcount_block_cache,
>> +        (void **) refcount_block, BLOCK_TABLE_REF);
>>      if (ret < 0) {
>>          goto fail_block;
>>      }
>> @@ -406,7 +410,8 @@ fail_table:
>>      g_free(new_table);
>>  fail_block:
>>      if (*refcount_block != NULL) {
>> -        qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
>> +        block_cache_put(bs, s->refcount_block_cache,
>> +            (void **) refcount_block, BLOCK_TABLE_REF);
>>      }
>>      return ret;
>>  }
>> @@ -432,8 +437,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>>      }
>>
>>      if (addend < 0) {
>> -        qcow2_cache_set_dependency(bs, s->refcount_block_cache,
>> -            s->l2_table_cache);
>> +        block_cache_set_dependency(bs, s->refcount_block_cache, BLOCK_TABLE_REF,
>> +            s->l2_table_cache, s->cluster_size);
>>      }
>>
>>      start = offset & ~(s->cluster_size - 1);
>> @@ -449,8 +454,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>>          /* Load the refcount block and allocate it if needed */
>>          if (table_index != old_table_index) {
>>              if (refcount_block) {
>> -                ret = qcow2_cache_put(bs, s->refcount_block_cache,
>> -                    (void**) &refcount_block);
>> +                ret = block_cache_put(bs, s->refcount_block_cache,
>> +                    (void **) &refcount_block, BLOCK_TABLE_REF);
>>                  if (ret < 0) {
>>                      goto fail;
>>                  }
>> @@ -463,7 +468,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>>          }
>>          old_table_index = table_index;
>>
>> -        qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
>> +        block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
>>
>>          /* we can update the count and save it */
>>          block_index = cluster_index &
>> @@ -486,8 +491,8 @@ fail:
>>      /* Write last changed block to disk */
>>      if (refcount_block) {
>>          int wret;
>> -        wret = qcow2_cache_put(bs, s->refcount_block_cache,
>> -            (void**) &refcount_block);
>> +        wret = block_cache_put(bs, s->refcount_block_cache,
>> +            (void **) &refcount_block, BLOCK_TABLE_REF);
>>          if (wret < 0) {
>>              return ret < 0 ? ret : wret;
>>          }
>> @@ -763,8 +768,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>>              old_l2_offset = l2_offset;
>>              l2_offset &= L1E_OFFSET_MASK;
>>
>> -            ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
>> -                (void**) &l2_table);
>> +            ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
>> +                (void **) &l2_table, BLOCK_TABLE_L2, s->cluster_size);
>>              if (ret < 0) {
>>                  goto fail;
>>              }
>> @@ -811,16 +816,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>>                      }
>>                      if (offset != old_offset) {
>>                          if (addend > 0) {
>> -                            qcow2_cache_set_dependency(bs, s->l2_table_cache,
>> -                                s->refcount_block_cache);
>> +                            block_cache_set_dependency(bs, s->l2_table_cache,
>> +                                BLOCK_TABLE_L2, s->refcount_block_cache,
>> +                                s->cluster_size);
>>                          }
>>                          l2_table[j] = cpu_to_be64(offset);
>> -                        qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> +                        block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>                      }
>>                  }
>>              }
>>
>> -            ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +            ret = block_cache_put(bs, s->l2_table_cache,
>> +                (void **) &l2_table, BLOCK_TABLE_L2);
>>              if (ret < 0) {
>>                  goto fail;
>>              }
>> @@ -847,7 +854,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>>      ret = 0;
>>  fail:
>>      if (l2_table) {
>> -        qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> +        block_cache_put(bs, s->l2_table_cache,
>> +            (void **) &l2_table, BLOCK_TABLE_L2);
>>      }
>>
>>      /* Update L1 only if it isn't deleted anyway (addend = -1) */
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index fd5e214..b89d312 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -30,6 +30,7 @@
>>  #include "qemu-error.h"
>>  #include "qerror.h"
>>  #include "trace.h"
>> +#include "block-cache.h"
>>
>>  /*
>>    Differences with QCOW:
>> @@ -415,8 +416,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>>      }
>>
>>      /* alloc L2 table/refcount block cache */
>> -    s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
>> -    s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
>> +    s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
>> +    s->refcount_block_cache =
>> +        block_cache_create(bs, REFCOUNT_CACHE_SIZE, s->cluster_size);
>>
>>      s->cluster_cache = g_malloc(s->cluster_size);
>>      /* one more sector for decompressed data alignment */
>> @@ -500,7 +502,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>>      qcow2_refcount_close(bs);
>>      g_free(s->l1_table);
>>      if (s->l2_table_cache) {
>> -        qcow2_cache_destroy(bs, s->l2_table_cache);
>> +        block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
>>      }
>>      g_free(s->cluster_cache);
>>      qemu_vfree(s->cluster_data);
>> @@ -860,13 +862,13 @@ static void qcow2_close(BlockDriverState *bs)
>>      BDRVQcowState *s = bs->opaque;
>>      g_free(s->l1_table);
>>
>> -    qcow2_cache_flush(bs, s->l2_table_cache);
>> -    qcow2_cache_flush(bs, s->refcount_block_cache);
>> -
>> +    block_cache_flush(bs, s->l2_table_cache,
>> +        BLOCK_TABLE_L2, s->cluster_size);
>> +    block_cache_flush(bs, s->refcount_block_cache,
>> +        BLOCK_TABLE_REF, s->cluster_size);
>>      qcow2_mark_clean(bs);
>> -
>> -    qcow2_cache_destroy(bs, s->l2_table_cache);
>> -    qcow2_cache_destroy(bs, s->refcount_block_cache);
>> +    block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
>> +    block_cache_destroy(bs, s->refcount_block_cache, BLOCK_TABLE_REF);
>>
>>      g_free(s->unknown_header_fields);
>>      cleanup_unknown_header_ext(bs);
>> @@ -1339,8 +1341,6 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
>>                      options->value.s);
>>                  return -EINVAL;
>>              }
>> -        } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
>> -            flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
>>          }
>>          options++;
>>      }
>> @@ -1537,18 +1537,18 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
>>      int ret;
>>
>>      qemu_co_mutex_lock(&s->lock);
>> -    ret = qcow2_cache_flush(bs, s->l2_table_cache);
>> +    ret = block_cache_flush(bs, s->l2_table_cache,
>> +        BLOCK_TABLE_L2, s->cluster_size);
>>      if (ret < 0) {
>>          qemu_co_mutex_unlock(&s->lock);
>>          return ret;
>>      }
>>
>> -    if (qcow2_need_accurate_refcounts(s)) {
>> -        ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> -        if (ret < 0) {
>> -            qemu_co_mutex_unlock(&s->lock);
>> -            return ret;
>> -        }
>> +    ret = block_cache_flush(bs, s->refcount_block_cache,
>> +        BLOCK_TABLE_REF, s->cluster_size);
>> +    if (ret < 0) {
>> +        qemu_co_mutex_unlock(&s->lock);
>> +        return ret;
>>      }
>>      qemu_co_mutex_unlock(&s->lock);
>>
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index b4eb654..cb6fd7a 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -27,6 +27,7 @@
>>
>>  #include "aes.h"
>>  #include "qemu-coroutine.h"
>> +#include "block-cache.h"
>>
>>  //#define DEBUG_ALLOC
>>  //#define DEBUG_ALLOC2
>> @@ -94,8 +95,6 @@ typedef struct QCowSnapshot {
>>      uint64_t vm_clock_nsec;
>>  } QCowSnapshot;
>>
>> -struct Qcow2Cache;
>> -typedef struct Qcow2Cache Qcow2Cache;
>>
>>  typedef struct Qcow2UnknownHeaderExtension {
>>      uint32_t magic;
>> @@ -146,8 +145,8 @@ typedef struct BDRVQcowState {
>>      uint64_t l1_table_offset;
>>      uint64_t *l1_table;
>>
>> -    Qcow2Cache* l2_table_cache;
>> -    Qcow2Cache* refcount_block_cache;
>> +    BlockCache *l2_table_cache;
>> +    BlockCache *refcount_block_cache;
>>
>>      uint8_t *cluster_cache;
>>      uint8_t *cluster_data;
>> @@ -316,21 +315,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
>>
>>  void qcow2_free_snapshots(BlockDriverState *bs);
>>  int qcow2_read_snapshots(BlockDriverState *bs);
>> -
>> -/* qcow2-cache.c functions */
>> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
>> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
>> -
>> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
>> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
>> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
>> -    Qcow2Cache *dependency);
>> -void qcow2_cache_depends_on_flush(Qcow2Cache *c);
>> -
>> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> -    void **table);
>> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> -    void **table);
>> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
>> -
>>  #endif
>> diff --git a/trace-events b/trace-events
>> index 6b12f83..52b6438 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -439,12 +439,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
>>  qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
>>  qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
>>
>> -qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
>> -qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> -qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> -qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> -qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
>> -qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +# block/block-cache.c
>> +block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
>> +block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
>> +block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>>
>>  # block/qed-l2-cache.c
>>  qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
>> --
>> 1.7.1
>>
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-09-06 20:19   ` Michael Roth
@ 2012-09-10  2:25     ` Dong Xu Wang
  2012-09-11  9:44       ` Kevin Wolf
  0 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10  2:25 UTC (permalink / raw)
  To: Michael Roth; +Cc: kwolf, qemu-devel

On Fri, Sep 7, 2012 at 4:19 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:44PM +0800, Dong Xu Wang wrote:
>> add-cow file format core code. It use block-cache.c as cache code.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  block/Makefile.objs |    1 +
>>  block/add-cow.c     |  613 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block/add-cow.h     |   85 +++++++
>>  block_int.h         |    2 +
>>  4 files changed, 701 insertions(+), 0 deletions(-)
>>  create mode 100644 block/add-cow.c
>>  create mode 100644 block/add-cow.h
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 23bdfc8..7ed5051 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>  block-obj-y += qed-check.o
>> +block-obj-y += add-cow.o
>>  block-obj-y += block-cache.o
>>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>  block-obj-y += stream.o
>> diff --git a/block/add-cow.c b/block/add-cow.c
>> new file mode 100644
>> index 0000000..d4711d5
>> --- /dev/null
>> +++ b/block/add-cow.c
>> @@ -0,0 +1,613 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "block_int.h"
>> +#include "module.h"
>> +#include "add-cow.h"
>> +
>> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
>> +{
>> +    cpu->magic                      = le64_to_cpu(le->magic);
>> +    cpu->version                    = le32_to_cpu(le->version);
>> +
>> +    cpu->backing_filename_offset    = le32_to_cpu(le->backing_filename_offset);
>> +    cpu->backing_filename_size      = le32_to_cpu(le->backing_filename_size);
>> +
>> +    cpu->image_filename_offset      = le32_to_cpu(le->image_filename_offset);
>> +    cpu->image_filename_size        = le32_to_cpu(le->image_filename_size);
>> +
>> +    cpu->features                   = le64_to_cpu(le->features);
>> +    cpu->optional_features          = le64_to_cpu(le->optional_features);
>> +    cpu->header_pages_size          = le32_to_cpu(le->header_pages_size);
>> +}
>> +
>> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
>> +{
>> +    le->magic                       = cpu_to_le64(cpu->magic);
>> +    le->version                     = cpu_to_le32(cpu->version);
>> +
>> +    le->backing_filename_offset     = cpu_to_le32(cpu->backing_filename_offset);
>> +    le->backing_filename_size       = cpu_to_le32(cpu->backing_filename_size);
>> +
>> +    le->image_filename_offset       = cpu_to_le32(cpu->image_filename_offset);
>> +    le->image_filename_size         = cpu_to_le32(cpu->image_filename_size);
>> +
>> +    le->features                    = cpu_to_le64(cpu->features);
>> +    le->optional_features           = cpu_to_le64(cpu->optional_features);
>> +    le->header_pages_size           = cpu_to_le32(cpu->header_pages_size);
>> +}
>> +
>> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
>> +{
>> +    const AddCowHeader *header = (const AddCowHeader *)buf;
>> +
>> +    if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
>> +        le32_to_cpu(header->version) == ADD_COW_VERSION) {
>> +        return 100;
>> +    } else {
>> +        return 0;
>> +    }
>> +}
>> +
>> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
>> +{
>> +    AddCowHeader header = {
>> +        .magic = ADD_COW_MAGIC,
>> +        .version = ADD_COW_VERSION,
>> +        .features = 0,
>> +        .optional_features = 0,
>> +        .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
>> +    };
>> +    AddCowHeader le_header;
>> +    int64_t image_len = 0;
>> +    const char *backing_filename = NULL;
>> +    const char *backing_fmt = NULL;
>> +    const char *image_filename = NULL;
>> +    const char *image_format = NULL;
>> +    BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
>> +    BlockDriver *drv = bdrv_find_format("add-cow");
>> +    BDRVAddCowState s;
>> +    int ret;
>> +
>> +    while (options && options->name) {
>> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
>> +            image_len = options->value.n;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
>> +            backing_filename = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
>> +            backing_fmt = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
>> +            image_filename = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
>> +            image_format = options->value.s;
>> +        }
>> +        options++;
>> +    }
>> +
>> +    if (backing_filename) {
>> +        header.backing_filename_offset = sizeof(header)
>> +            + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
>> +        header.backing_filename_size = strlen(backing_filename);
>> +
>> +        if (!backing_fmt) {
>> +            backing_bs = bdrv_new("image");
>> +            ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
>> +                    | BDRV_O_CACHE_WB, NULL);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +            backing_fmt = bdrv_get_format_name(backing_bs);
>> +            bdrv_delete(backing_bs);
>> +        }
>> +    } else {
>> +        header.features |= ADD_COW_F_All_ALLOCATED;
>> +    }
>> +
>> +    if (image_filename) {
>> +        header.image_filename_offset =
>> +            sizeof(header) + sizeof(s.backing_file_format)
>> +                + sizeof(s.image_file_format) + header.backing_filename_size;
>> +        header.image_filename_size = strlen(image_filename);
>> +    } else {
>> +        error_report("Error: image_file should be given.");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (backing_filename && !strcmp(backing_filename, image_filename)) {
>> +        error_report("Error: Trying to create an image with the "
>> +                     "same backing file name as the image file name");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (!strcmp(filename, image_filename)) {
>> +        error_report("Error: Trying to create an image with the "
>> +                     "same filename as the image file name");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (header.image_filename_offset + header.image_filename_size
>> +            > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
>> +        error_report("image_file name or backing_file name too long.");
>> +        return -ENOSPC;
>> +    }
>> +
>> +    ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    bdrv_delete(image_bs);
>> +
>> +    ret = bdrv_create_file(filename, NULL);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    add_cow_header_cpu_to_le(&header, &le_header);
>> +    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
>> +        backing_fmt ? strlen(backing_fmt) : 0);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
>> +        image_format ? image_format : "raw",
>> +        image_format ? strlen(image_format) : sizeof("raw"));
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    if (backing_filename) {
>> +        ret = bdrv_pwrite(bs, header.backing_filename_offset,
>> +            backing_filename, header.backing_filename_size);
>> +        if (ret < 0) {
>> +            bdrv_delete(bs);
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, header.image_filename_offset,
>> +        image_filename, header.image_filename_size);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_truncate(bs, image_len);
>> +    bdrv_delete(bs);
>> +    return ret;
>> +}
>> +
>> +static int add_cow_open(BlockDriverState *bs, int flags)
>> +{
>> +    char                image_filename[ADD_COW_FILE_LEN];
>> +    char                tmp_name[ADD_COW_FILE_LEN];
>> +    BlockDriver         *image_drv = NULL;
>> +    int                 ret;
>> +    int                 sector_per_byte;
>> +    BDRVAddCowState     *s = bs->opaque;
>> +    AddCowHeader        le_header;
>> +
>> +    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
>> +    if (ret != sizeof(s->header)) {
>> +        goto fail;
>> +    }
>> +
>> +    add_cow_header_le_to_cpu(&le_header, &s->header);
>> +
>> +    if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    if (s->header.version != ADD_COW_VERSION) {
>> +        char version[64];
>> +        snprintf(version, sizeof(version), "ADD-COW version %d",
>> +            s->header.version);
>> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> +            bs->device_name, "add-cow", version);
>> +        ret = -ENOTSUP;
>> +        goto fail;
>> +    }
>> +
>> +    if (s->header.features & ~ADD_COW_FEATURE_MASK) {
>> +        char buf[64];
>> +        snprintf(buf, sizeof(buf), "%" PRIx64,
>> +            s->header.features & ~ADD_COW_FEATURE_MASK);
>> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> +            bs->device_name, "add-cow", buf);
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        ret = bdrv_read_string(bs->file, sizeof(s->header),
>> +            sizeof(s->backing_file_format) - 1, s->backing_file_format,
>> +            sizeof(s->backing_file_format));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_read_string(bs->file,
>> +            sizeof(s->header) + sizeof(s->image_file_format),
>> +            sizeof(s->image_file_format) - 1, s->image_file_format,
>> +            sizeof(s->image_file_format));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>> +                          s->header.backing_filename_size, bs->backing_file,
>> +                          sizeof(bs->backing_file));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
>> +                      s->header.image_filename_size, tmp_name,
>> +                      sizeof(tmp_name));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    s->image_hd = bdrv_new("");
>> +    if (path_has_protocol(image_filename)) {
>> +        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
>> +    } else {
>> +        path_combine(image_filename, sizeof(image_filename),
>> +                     bs->filename, tmp_name);
>> +    }
>> +
>> +    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
>> +    if (ret < 0) {
>> +        bdrv_delete(s->image_hd);
>> +        goto fail;
>> +    }
>> +
>> +    bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
>> +    s->cluster_size = ADD_COW_CLUSTER_SIZE;
>> +    sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> +    s->bitmap_size =
>> +        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
>> +    s->bitmap_cache =
>> +        block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
>> +
>> +    qemu_co_mutex_init(&s->lock);
>> +    return 0;
>> +fail:
>> +    if (s->bitmap_cache) {
>> +        block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> +    }
>> +    return ret;
>> +}
>> +
>> +static void add_cow_close(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> +    bdrv_delete(s->image_hd);
>> +}
>> +
>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>> +{
>> +    BDRVAddCowState *s  = bs->opaque;
>> +    BlockCache *c = s->bitmap_cache;
>> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> +    uint8_t *table      = NULL;
>> +    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> +        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
>> +    int ret = block_cache_get(bs, s->bitmap_cache, offset,
>> +        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>> +
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
>> +        & (1 << (cluster_num % 8));
>> +}
>> +
>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>> +        int64_t sector_num, int nb_sectors, int *num_same)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int changed;
>> +
>> +    if (nb_sectors == 0) {
>> +        *num_same = 0;
>> +        return 0;
>> +    }
>> +
>> +    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
>> +        *num_same = nb_sectors - 1;
>> +        return 1;
>> +    }
>> +    changed = is_allocated(bs, sector_num);
>> +
>> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
>> +            break;
>> +        }
>> +    }
>> +    return changed;
>> +}
>> +
>> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
>> +                  int64_t sector_num, int nb_sectors)
>> +{
>> +    int n1;
>> +    if ((sector_num + nb_sectors) <= bs->total_sectors) {
>> +        return nb_sectors;
>> +    }
>> +    if (sector_num >= bs->total_sectors) {
>> +        n1 = 0;
>> +    } else {
>> +        n1 = bs->total_sectors - sector_num;
>> +    }
>> +
>> +    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
>> +        0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
>> +
>> +    return n1;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
>> +    int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s  = bs->opaque;
>> +    int cur_nr_sectors;
>> +    uint64_t bytes_done = 0;
>> +    QEMUIOVector hd_qiov;
>> +    int n, n1, ret = 0;
>> +
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    qemu_co_mutex_lock(&s->lock);
>> +    while (remaining_sectors != 0) {
>> +        cur_nr_sectors = remaining_sectors;
>> +        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
>> +            cur_nr_sectors = n;
>> +            qemu_iovec_reset(&hd_qiov);
>> +            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
>> +            qemu_co_mutex_unlock(&s->lock);
>> +            ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
>> +            qemu_co_mutex_lock(&s->lock);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        } else {
>> +            cur_nr_sectors = n;
>> +            if (bs->backing_hd) {
>> +                qemu_iovec_reset(&hd_qiov);
>> +                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
>> +                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
>> +                    sector_num, cur_nr_sectors);
>> +                if (n1 > 0) {
>> +                    qemu_co_mutex_unlock(&s->lock);
>> +                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
>> +                                        n, &hd_qiov);
>> +                    qemu_co_mutex_lock(&s->lock);
>> +                    if (ret < 0) {
>> +                        goto fail;
>> +                    }
>> +                }
>> +            } else {
>> +                qemu_iovec_memset(&hd_qiov, 0, 0,
>> +                    BDRV_SECTOR_SIZE * cur_nr_sectors);
>> +            }
>> +        }
>> +        remaining_sectors -= cur_nr_sectors;
>> +        sector_num += cur_nr_sectors;
>> +        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
>> +    }
>> +fail:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_iovec_destroy(&hd_qiov);
>> +    return ret;
>> +}
>> +
>> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
>> +                                     int n_start, int n_end)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    QEMUIOVector qiov;
>> +    struct iovec iov;
>> +    int n, ret;
>> +
>> +    n = n_end - n_start;
>> +    if (n <= 0) {
>> +        return 0;
>> +    }
>> +
>> +    iov.iov_len = n * BDRV_SECTOR_SIZE;
>> +    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
>> +
>> +    qemu_iovec_init_external(&qiov, &iov, 1);
>> +
>> +    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = 0;
>> +out:
>> +    qemu_vfree(iov.iov_base);
>> +    return ret;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
>> +        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    BlockCache *c = s->bitmap_cache;
>> +    int ret = 0, i;
>> +    QEMUIOVector hd_qiov;
>> +    uint8_t *table;
>> +    uint64_t offset;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    ret = bdrv_co_writev(s->image_hd,
>> +                     sector_num,
>> +                     remaining_sectors, qiov);
>
> alignment                   ^
>
> or even at ^ if you prefer and have done in some places, just need to be
> consistent about it for better readability.
>
>> +
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        /* Copy content of unmodified sectors */
>> +        if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
>
> Why do we avoid a COW when writing to the first sector of a cluster?

Because if it is the first sector, we need not use copy_sector, we
write it directly would be enough, it starts at the begening of one
cluster.

>
>> +            ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
>> +                sector_num);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1)
>> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
>> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
>> +                ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        for (i = sector_num / SECTORS_PER_CLUSTER;
>> +            i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
>> +            i++) {
>> +            offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> +                + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
>> +            ret = block_cache_get(bs, s->bitmap_cache, offset,
>> +                (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
>> +                table[i / 8] |= (1 << (i % 8));
>> +                block_cache_entry_mark_dirty(s->bitmap_cache, table);
>> +            }
>> +        }
>> +    }
>> +    ret = 0;
>> +fail:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_iovec_destroy(&hd_qiov);
>> +    return ret;
>> +}
>> +
>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> +    int ret;
>> +    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
>> +    int64_t bitmap_size =
>> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>> +    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
>> +        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
>> +
>> +    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return 0;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int ret;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
>> +        ADD_COW_CACHE_ENTRY_SIZE);
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    return ret;
>> +}
>> +
>> +static QEMUOptionParameter add_cow_create_options[] = {
>> +    {
>> +        .name = BLOCK_OPT_SIZE,
>> +        .type = OPT_SIZE,
>> +        .help = "Virtual disk size"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_BACKING_FILE,
>> +        .type = OPT_STRING,
>> +        .help = "File name of a base image"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_BACKING_FMT,
>> +        .type = OPT_STRING,
>> +        .help = "Image format of the base image"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_IMAGE_FILE,
>> +        .type = OPT_STRING,
>> +        .help = "File name of a image file"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_IMAGE_FORMAT,
>> +        .type = OPT_STRING,
>> +        .help = "Image format of the image file"
>> +    },
>> +    { NULL }
>> +};
>> +
>> +static BlockDriver bdrv_add_cow = {
>> +    .format_name                = "add-cow",
>> +    .instance_size              = sizeof(BDRVAddCowState),
>> +    .bdrv_probe                 = add_cow_probe,
>> +    .bdrv_open                  = add_cow_open,
>> +    .bdrv_close                 = add_cow_close,
>> +    .bdrv_create                = add_cow_create,
>> +    .bdrv_co_readv              = add_cow_co_readv,
>> +    .bdrv_co_writev             = add_cow_co_writev,
>> +    .bdrv_truncate              = bdrv_add_cow_truncate,
>> +    .bdrv_co_is_allocated       = add_cow_is_allocated,
>> +
>> +    .create_options             = add_cow_create_options,
>> +    .bdrv_co_flush_to_os        = add_cow_co_flush,
>> +};
>> +
>> +static void bdrv_add_cow_init(void)
>> +{
>> +    bdrv_register(&bdrv_add_cow);
>> +}
>> +
>> +block_init(bdrv_add_cow_init);
>> diff --git a/block/add-cow.h b/block/add-cow.h
>> new file mode 100644
>> index 0000000..f058376
>> --- /dev/null
>> +++ b/block/add-cow.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef BLOCK_ADD_COW_H
>> +#define BLOCK_ADD_COW_H
>> +#include "block-cache.h"
>> +
>> +enum {
>> +    ADD_COW_F_All_ALLOCATED     = 0X01,
>
> Please use "ADD_COW_F_ALL_ALLOCATED" (all caps)

Okay.
>
> was searching your patch for how this was used and was scratching my
> head when I wasn't seeing any matches :)

It wil be used such as:
qemu-img create -f add-cow -o image_file=t.raw t.add-cow

while we need not read from backing_file any more.

>
>> +    ADD_COW_FEATURE_MASK        = ADD_COW_F_All_ALLOCATED,
>> +
>> +    ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
>> +                    ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
>> +                    ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
>> +                    ((uint64_t)'W' << 8) | 0xFF),
>> +    ADD_COW_VERSION             = 1,
>> +    ADD_COW_FILE_LEN            = 1024,
>> +    ADD_COW_CACHE_SIZE          = 16,
>> +    ADD_COW_CACHE_ENTRY_SIZE    = 65536,
>> +    ADD_COW_CLUSTER_SIZE        = 65536,
>> +    SECTORS_PER_CLUSTER         = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
>> +    ADD_COW_PAGE_SIZE           = 4096,
>> +    ADD_COW_DEFAULT_PAGE_SIZE   = 1,
>> +};
>> +
>> +typedef struct AddCowHeader {
>> +    uint64_t        magic;
>> +    uint32_t        version;
>> +
>> +    uint32_t        backing_filename_offset;
>> +    uint32_t        backing_filename_size;
>> +
>> +    uint32_t        image_filename_offset;
>> +    uint32_t        image_filename_size;
>> +
>> +    uint64_t        features;
>> +    uint64_t        optional_features;
>> +    uint32_t        header_pages_size;
>> +} QEMU_PACKED AddCowHeader;
>
> You should avoid using packed structures for image format headers.
> Instead, I would either:
>
> a) re-order the fields so that 32/64-bit fields, respectively, fall on
> 32/64-bit boundaries (in your case, for instance, moving header_pages_size
> above features) like qed/qcow2 do, or
>
> b) read/write the fields individually rather than reading/writing directly
> into/from the header struct.
>
> The safest route is b). Adds a few lines of code, but you won't have to
> re-work things (or worry about introducing bugs) later if you were to add,
> say, a 32-bit value, and then a 64-bit value later.

While, Kevin's suggestion is using PACKED, so ..
>
>> +
>> +typedef struct BDRVAddCowState {
>> +    BlockDriverState    *image_hd;
>> +    CoMutex             lock;
>> +    int                 cluster_size;
>> +    BlockCache         *bitmap_cache;
>> +    uint64_t            bitmap_size;
>> +    AddCowHeader        header;
>> +    char                backing_file_format[16];
>> +    char                image_file_format[16];
>> +} BDRVAddCowState;
>> +
>> +/* Convert sector_num to offset in bitmap */
>> +static inline int64_t offset_in_bitmap(int64_t sector_num)
>> +{
>> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> +    return cluster_num / 8;
>> +}
>> +
>> +static inline bool is_cluster_head(int64_t sector_num)
>> +{
>> +    return sector_num % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +static inline bool is_cluster_tail(int64_t sector_num)
>> +{
>> +    return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
>> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
>> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
>> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
>> +    void **table);
>> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
>> +#endif
>> diff --git a/block_int.h b/block_int.h
>> index 6c1d9ca..67954ec 100644
>> --- a/block_int.h
>> +++ b/block_int.h
>> @@ -53,6 +53,8 @@
>>  #define BLOCK_OPT_SUBFMT            "subformat"
>>  #define BLOCK_OPT_COMPAT_LEVEL      "compat"
>>  #define BLOCK_OPT_LAZY_REFCOUNTS    "lazy_refcounts"
>> +#define BLOCK_OPT_IMAGE_FILE        "image_file"
>> +#define BLOCK_OPT_IMAGE_FORMAT      "image_format"
>>
>>  typedef struct BdrvTrackedRequest BdrvTrackedRequest;
>>
>> --
>> 1.7.1
>>
>>
>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
  2012-09-06 17:27   ` Michael Roth
@ 2012-09-10 15:23   ` Kevin Wolf
  2012-09-11  2:12     ` Dong Xu Wang
  1 sibling, 1 reply; 25+ messages in thread
From: Kevin Wolf @ 2012-09-10 15:23 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: qemu-devel

Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> Document for add-cow format, the usage and spec of add-cow are introduced.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  docs/specs/add-cow.txt |  123 ++++++++++++++++++++++++++++++++++++++++++++++++
>  1 files changed, 123 insertions(+), 0 deletions(-)
>  create mode 100644 docs/specs/add-cow.txt
> 
> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
> new file mode 100644
> index 0000000..d5a7a68
> --- /dev/null
> +++ b/docs/specs/add-cow.txt
> @@ -0,0 +1,123 @@
> +== General ==
> +
> +The raw file format does not support backing files or copy on write feature.
> +The add-cow image format makes it possible to use backing files with raw
> +image by keeping a separate .add-cow metadata file. Once all sectors
> +have been written into the raw image it is safe to discard the .add-cow
> +and backing files, then we can use the raw image directly.
> +
> +An example usage of add-cow would look like::
> +(ubuntu.img is a disk image which has been installed OS.)
> +    1)  Create a raw image with the same size of ubuntu.img
> +            qemu-img create -f raw test.raw 8G
> +    2)  Create an add-cow image which will store dirty bitmap
> +            qemu-img create -f add-cow test.add-cow \
> +                -o backing_file=ubuntu.img,image_file=test.raw
> +    3)  Run qemu with add-cow image
> +            qemu -drive if=virtio,file=test.add-cow
> +
> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
> +will be calculated from the size of test.raw.
> +
> +=Specification=
> +
> +The file format looks like this:
> +
> + +---------------+-------------+-----------------+
> + |     Header    |   Reserved  |    COW bitmap   |
> + +---------------+-------------+-----------------+
> +
> +All numbers in add-cow are stored in Little Endian byte order.
> +
> +== Header ==
> +
> +The Header is included in the first bytes:
> +(#define HEADER_SIZE (4096 * header_pages_size))
> +    Byte    0 -  7:     magic
> +                        add-cow magic string ("ADD_COW\xff").
> +
> +            8 -  11:    version
> +                        Version number (only valid value is 1 now).
> +
> +            12 - 15:    backing file name offset
> +                        Offset in the add-cow file at which the backing file
> +                        name is stored (NB: The string is not nul-terminated).
> +                        If backing file name does NOT exist, this field will be
> +                        0. Must be between 80 and [HEADER_SIZE - 2](a file name
> +                        must be at least 1 byte).
> +
> +            16 - 19:    backing file name size
> +                        Length of the backing file name in bytes. It will be 0
> +                        if the backing file name offset is 0. If backing file
> +                        name offset is non-zero, then it must be non-zero. Must
> +                        be less than [HEADER_SIZE - 80] to fit in the reserved
> +                        part of the header.
> +
> +            20 - 23:    image file name offset
> +                        Offset in the add-cow file at which the image file name
> +                        is stored (NB: The string is not null terminated). It
> +                        must be between 80 and [HEADER_SIZE - 2].
> +
> +            24 - 27:    image file name size
> +                        Length of the image file name in bytes.
> +                        Must be less than [HEADER_SIZE - 80] to fit in the reserved
> +                        part of the header.
> +
> +            28 - 35:    features
> +                        Currently only 1 feature bit is used:

What happens when opening a file with an unknown bit set? How must
unknown bits be initialised?

> +                        Feature bits:
> +                            * ADD_COW_F_All_ALLOCATED   = 0x01.

What does this flag mean, and is it required to be set on that
condition? Also, please use ALL_CAPS.

> +
> +            36 - 43:    optional features
> +                        Not used now. Reserved for future use. It must be set to 0.

And must be ignored when reading.

> +
> +            44 - 47:    header pages size
> +                        The header field is variable-sized. This field indicates
> +                        how many pages(4k) will be used to store add-cow header.
> +                        In add-cow v1, it is fixed to 1, so the header size will
> +                        be 4k * 1 = 4096 bytes.

Why arbitrarily defined "pages" instead of bytes or at least clusters?

> +
> +            48 - 63:    backing file format
> +                        format of backing file. It will be filled with 0 if
> +                        backing file name offset is 0. If backing file name
> +                        offset is non-zero, it must be non-zero. It is coded
> +                        in free-form ASCII, and is not NUL-terminated.

Zero padded on the right, I guess?

Also defining that a string must be "non-zero" looks odd, should
probably be "non-empty".

> +
> +            64 - 79:    image file format
> +                        format of image file. It must be non-zero. It is coded
> +                        in free-form ASCII, and is not NUL-terminated.

Same here.

> +
> +            80 - [HEADER_SIZE - 1]:
> +                        It is used to make sure COW bitmap field starts at the
> +                        HEADER_SIZE byte, backing file name and image file name
> +                        will be stored here. The bytes that is not pointing to
> +                        backing file and image file names will bet set to 0.

"will be set to 0" describes the behaviour of qemu. A spec should
describe the file format, not a specific implementation. Make it "must"
or "should".

> +
> +== COW bitmap ==
> +
> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
> +backing file and image file. The bitmap will track whether the sector in
> +backing file is dirty or not.
> +
> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
> +sectors, then each bit indicates 512 * 128 = 64k bytes.

Should we make the cluster size configurable?

> the size of bitmap is
> +calculated according to virtual size of image file, and it also should be multipe

Typo: multiple

Sure you mean "should", or should it be "must"?

> +of 65536, the bits not used will be set to 0. Within each byte, the least
> +significant bit covers the first cluster. Bit orders in one byte look like:
> + +----+----+----+----+----+----+----+----+
> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
> + +----+----+----+----+----+----+----+----+
> +
> +If the bit is 0, indicates the sector has not been allocated in image file, data
> +should be loaded from backing file while reading; if the bit is 1, indicates the
> +related sector has been dirty, should be loaded from image file while reading.
> +Writing to a sector causes the corresponding bit to be set to 1.
> +
> +If raw image is not an even multiple of cluster bytes, bits that correspond to
> +bytes beyond the raw file size in add-cow will be 0.

"must be written as 0 and must be ignored when reading" or something
like that.

> +Image file name and backing file name must NOT be the same, we prevent this
> +while creating add-cow files.

What we do is irrelevant for a spec.

> +Image file and backing file are interpreted relative to the qcow2 file, not
> +to the current working directory of the process that opened the qcow2 file.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
  2012-09-10 15:23   ` Kevin Wolf
@ 2012-09-11  2:12     ` Dong Xu Wang
  0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-11  2:12 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On Mon, Sep 10, 2012 at 11:23 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 10.08.2012 17:39, schrieb Dong Xu Wang:
>> Document for add-cow format, the usage and spec of add-cow are introduced.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  docs/specs/add-cow.txt |  123 ++++++++++++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 123 insertions(+), 0 deletions(-)
>>  create mode 100644 docs/specs/add-cow.txt
>>
>> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
>> new file mode 100644
>> index 0000000..d5a7a68
>> --- /dev/null
>> +++ b/docs/specs/add-cow.txt
>> @@ -0,0 +1,123 @@
>> +== General ==
>> +
>> +The raw file format does not support backing files or copy on write feature.
>> +The add-cow image format makes it possible to use backing files with raw
>> +image by keeping a separate .add-cow metadata file. Once all sectors
>> +have been written into the raw image it is safe to discard the .add-cow
>> +and backing files, then we can use the raw image directly.
>> +
>> +An example usage of add-cow would look like::
>> +(ubuntu.img is a disk image which has been installed OS.)
>> +    1)  Create a raw image with the same size of ubuntu.img
>> +            qemu-img create -f raw test.raw 8G
>> +    2)  Create an add-cow image which will store dirty bitmap
>> +            qemu-img create -f add-cow test.add-cow \
>> +                -o backing_file=ubuntu.img,image_file=test.raw
>> +    3)  Run qemu with add-cow image
>> +            qemu -drive if=virtio,file=test.add-cow
>> +
>> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
>> +will be calculated from the size of test.raw.
>> +
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +---------------+-------------+-----------------+
>> + |     Header    |   Reserved  |    COW bitmap   |
>> + +---------------+-------------+-----------------+
>> +
>> +All numbers in add-cow are stored in Little Endian byte order.
>> +
>> +== Header ==
>> +
>> +The Header is included in the first bytes:
>> +(#define HEADER_SIZE (4096 * header_pages_size))
>> +    Byte    0 -  7:     magic
>> +                        add-cow magic string ("ADD_COW\xff").
>> +
>> +            8 -  11:    version
>> +                        Version number (only valid value is 1 now).
>> +
>> +            12 - 15:    backing file name offset
>> +                        Offset in the add-cow file at which the backing file
>> +                        name is stored (NB: The string is not nul-terminated).
>> +                        If backing file name does NOT exist, this field will be
>> +                        0. Must be between 80 and [HEADER_SIZE - 2](a file name
>> +                        must be at least 1 byte).
>> +
>> +            16 - 19:    backing file name size
>> +                        Length of the backing file name in bytes. It will be 0
>> +                        if the backing file name offset is 0. If backing file
>> +                        name offset is non-zero, then it must be non-zero. Must
>> +                        be less than [HEADER_SIZE - 80] to fit in the reserved
>> +                        part of the header.
>> +
>> +            20 - 23:    image file name offset
>> +                        Offset in the add-cow file at which the image file name
>> +                        is stored (NB: The string is not null terminated). It
>> +                        must be between 80 and [HEADER_SIZE - 2].
>> +
>> +            24 - 27:    image file name size
>> +                        Length of the image file name in bytes.
>> +                        Must be less than [HEADER_SIZE - 80] to fit in the reserved
>> +                        part of the header.
>> +
>> +            28 - 35:    features
>> +                        Currently only 1 feature bit is used:
>
> What happens when opening a file with an unknown bit set? How must
> unknown bits be initialised?

Okay, I will code as qcow2, report report_unsupported_feature error.
And I will update
the spec file.

>
>> +                        Feature bits:
>> +                            * ADD_COW_F_All_ALLOCATED   = 0x01.
>
> What does this flag mean, and is it required to be set on that
> condition? Also, please use ALL_CAPS.

This feature bit will used as:
qemu-img create -f add-cow -o image_file=t.raw t.add-cow.

While creating add-cow and without backing_file, this feature can
avoid reading/updating
bitmap. I think it can let the code be more faster.

And also, maybe, I can implement add_cow_check, check if the feature
bit should be set.
How do you think, Kevin?

>
>> +
>> +            36 - 43:    optional features
>> +                        Not used now. Reserved for future use. It must be set to 0.
>
> And must be ignored when reading.
>
Okay.

>> +
>> +            44 - 47:    header pages size
>> +                        The header field is variable-sized. This field indicates
>> +                        how many pages(4k) will be used to store add-cow header.
>> +                        In add-cow v1, it is fixed to 1, so the header size will
>> +                        be 4k * 1 = 4096 bytes.
>
> Why arbitrarily defined "pages" instead of bytes or at least clusters?

Okay, next version I will just caclulate it by bytes.
>
>> +
>> +            48 - 63:    backing file format
>> +                        format of backing file. It will be filled with 0 if
>> +                        backing file name offset is 0. If backing file name
>> +                        offset is non-zero, it must be non-zero. It is coded
>> +                        in free-form ASCII, and is not NUL-terminated.
>
> Zero padded on the right, I guess?

Yes, will update.

>
> Also defining that a string must be "non-zero" looks odd, should
> probably be "non-empty".
>
Okay.

>> +
>> +            64 - 79:    image file format
>> +                        format of image file. It must be non-zero. It is coded
>> +                        in free-form ASCII, and is not NUL-terminated.
>
> Same here.
Okay.
>
>> +
>> +            80 - [HEADER_SIZE - 1]:
>> +                        It is used to make sure COW bitmap field starts at the
>> +                        HEADER_SIZE byte, backing file name and image file name
>> +                        will be stored here. The bytes that is not pointing to
>> +                        backing file and image file names will bet set to 0.
>
> "will be set to 0" describes the behaviour of qemu. A spec should
> describe the file format, not a specific implementation. Make it "must"
> or "should".
Okay.
>
>> +
>> +== COW bitmap ==
>> +
>> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
>> +backing file and image file. The bitmap will track whether the sector in
>> +backing file is dirty or not.
>> +
>> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
>> +sectors, then each bit indicates 512 * 128 = 64k bytes.
>
> Should we make the cluster size configurable?
>
>> the size of bitmap is
>> +calculated according to virtual size of image file, and it also should be multipe
>
> Typo: multiple
>
> Sure you mean "should", or should it be "must"?
Okay.

>
>> +of 65536, the bits not used will be set to 0. Within each byte, the least
>> +significant bit covers the first cluster. Bit orders in one byte look like:
>> + +----+----+----+----+----+----+----+----+
>> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
>> + +----+----+----+----+----+----+----+----+
>> +
>> +If the bit is 0, indicates the sector has not been allocated in image file, data
>> +should be loaded from backing file while reading; if the bit is 1, indicates the
>> +related sector has been dirty, should be loaded from image file while reading.
>> +Writing to a sector causes the corresponding bit to be set to 1.
>> +
>> +If raw image is not an even multiple of cluster bytes, bits that correspond to
>> +bytes beyond the raw file size in add-cow will be 0.
>
> "must be written as 0 and must be ignored when reading" or something
> like that.

Okay.
>
>> +Image file name and backing file name must NOT be the same, we prevent this
>> +while creating add-cow files.
>
> What we do is irrelevant for a spec.

Okay.

>
>> +Image file and backing file are interpreted relative to the qcow2 file, not
>> +to the current working directory of the process that opened the qcow2 file.
>
> Kevin
>

Thank you, Kevin.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
  2012-09-06 17:52   ` Michael Roth
@ 2012-09-11  8:41   ` Kevin Wolf
  1 sibling, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11  8:41 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: qemu-devel

Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> add-cow and qcow2 file format will share the same cache code, so rename
> block-cache.c to block-cache.c. And related structure and qcow2 code also
> are changed.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  block.h                |    3 +
>  block/Makefile.objs    |    3 +-
>  block/qcow2-cache.c    |  323 ------------------------------------------------
>  block/qcow2-cluster.c  |   66 ++++++----
>  block/qcow2-refcount.c |   66 ++++++-----
>  block/qcow2.c          |   36 +++---
>  block/qcow2.h          |   24 +---
>  trace-events           |   13 +-
>  8 files changed, 109 insertions(+), 425 deletions(-)
>  delete mode 100644 block/qcow2-cache.c
> 
> diff --git a/block.h b/block.h
> index e5dfcd7..c325661 100644
> --- a/block.h
> +++ b/block.h
> @@ -401,6 +401,9 @@ typedef enum {
>      BLKDBG_CLUSTER_ALLOC_BYTES,
>      BLKDBG_CLUSTER_FREE,
>  
> +    BLKDBG_ADD_COW_UPDATE,
> +    BLKDBG_ADD_COW_LOAD,
> +

I don't think you should add new events, the existing ones should be
generic enough that you can reuse them. It's somewhat hard to see
without block-cache.c, though.

Can you make sure to have one patch with pure code motion, and a
separate one with the changes needed to make it work with add-cow? It
will help reviewers a lot.

>      BLKDBG_EVENT_MAX,
>  } BlkDebugEvent;
>  
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index e179211..335dc7a 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -28,6 +28,7 @@
>  #include "block_int.h"
>  #include "block/qcow2.h"
>  #include "trace.h"
> +#include "block-cache.h"
>  
>  int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>  {
> @@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>          return new_l1_table_offset;
>      }
>  
> -    ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> +    ret = block_cache_flush(bs, s->refcount_block_cache,
> +        BLOCK_TABLE_REF, s->cluster_size);

I think its better to pass s->cluster_size to the cache initialisation
instead of in each call of the cache function.

For the blkdebug events I guess it's possible as well to move this to
the initialisation, but I'd have to see the block-cache.c code to say
something specific about this.

> @@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>       * handled.
>       */
>      if (cow) {
> -        qcow2_cache_depends_on_flush(s->l2_table_cache);
> +        block_cache_depends_on_flush(s->l2_table_cache);
>      }
>  
> -    if (qcow2_need_accurate_refcounts(s)) {
> -        qcow2_cache_set_dependency(bs, s->l2_table_cache,
> -                                   s->refcount_block_cache);
> -    }
> +    block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
> +        s->refcount_block_cache, s->cluster_size);

What happened with lazy refcounting? Is this a mismerge or did you
intentionally remove the condition? (There's a second place where you do
the same)

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
  2012-09-06 20:19   ` Michael Roth
@ 2012-09-11  9:40   ` Kevin Wolf
  2012-09-12  7:28     ` Dong Xu Wang
  1 sibling, 1 reply; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11  9:40 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: qemu-devel

Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> add-cow file format core code. It use block-cache.c as cache code.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  block/Makefile.objs |    1 +
>  block/add-cow.c     |  613 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  block/add-cow.h     |   85 +++++++
>  block_int.h         |    2 +
>  4 files changed, 701 insertions(+), 0 deletions(-)
>  create mode 100644 block/add-cow.c
>  create mode 100644 block/add-cow.h
> 
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 23bdfc8..7ed5051 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-obj-y += qed-check.o
> +block-obj-y += add-cow.o
>  block-obj-y += block-cache.o
>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>  block-obj-y += stream.o
> diff --git a/block/add-cow.c b/block/add-cow.c
> new file mode 100644
> index 0000000..d4711d5
> --- /dev/null
> +++ b/block/add-cow.c
> @@ -0,0 +1,613 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "block_int.h"
> +#include "module.h"
> +#include "add-cow.h"
> +
> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
> +{
> +    cpu->magic                      = le64_to_cpu(le->magic);
> +    cpu->version                    = le32_to_cpu(le->version);
> +
> +    cpu->backing_filename_offset    = le32_to_cpu(le->backing_filename_offset);
> +    cpu->backing_filename_size      = le32_to_cpu(le->backing_filename_size);
> +
> +    cpu->image_filename_offset      = le32_to_cpu(le->image_filename_offset);
> +    cpu->image_filename_size        = le32_to_cpu(le->image_filename_size);
> +
> +    cpu->features                   = le64_to_cpu(le->features);
> +    cpu->optional_features          = le64_to_cpu(le->optional_features);
> +    cpu->header_pages_size          = le32_to_cpu(le->header_pages_size);
> +}
> +
> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
> +{
> +    le->magic                       = cpu_to_le64(cpu->magic);
> +    le->version                     = cpu_to_le32(cpu->version);
> +
> +    le->backing_filename_offset     = cpu_to_le32(cpu->backing_filename_offset);
> +    le->backing_filename_size       = cpu_to_le32(cpu->backing_filename_size);
> +
> +    le->image_filename_offset       = cpu_to_le32(cpu->image_filename_offset);
> +    le->image_filename_size         = cpu_to_le32(cpu->image_filename_size);
> +
> +    le->features                    = cpu_to_le64(cpu->features);
> +    le->optional_features           = cpu_to_le64(cpu->optional_features);
> +    le->header_pages_size           = cpu_to_le32(cpu->header_pages_size);
> +}
> +
> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> +    const AddCowHeader *header = (const AddCowHeader *)buf;
> +
> +    if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
> +        le32_to_cpu(header->version) == ADD_COW_VERSION) {
> +        return 100;
> +    } else {
> +        return 0;
> +    }
> +}
> +
> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
> +{
> +    AddCowHeader header = {
> +        .magic = ADD_COW_MAGIC,
> +        .version = ADD_COW_VERSION,
> +        .features = 0,
> +        .optional_features = 0,
> +        .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
> +    };
> +    AddCowHeader le_header;
> +    int64_t image_len = 0;
> +    const char *backing_filename = NULL;
> +    const char *backing_fmt = NULL;
> +    const char *image_filename = NULL;
> +    const char *image_format = NULL;
> +    BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
> +    BlockDriver *drv = bdrv_find_format("add-cow");
> +    BDRVAddCowState s;
> +    int ret;
> +
> +    while (options && options->name) {
> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
> +            image_len = options->value.n;
> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
> +            backing_filename = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
> +            backing_fmt = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
> +            image_filename = options->value.s;
> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
> +            image_format = options->value.s;
> +        }
> +        options++;
> +    }
> +
> +    if (backing_filename) {
> +        header.backing_filename_offset = sizeof(header)
> +            + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
> +        header.backing_filename_size = strlen(backing_filename);
> +
> +        if (!backing_fmt) {
> +            backing_bs = bdrv_new("image");
> +            ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
> +                    | BDRV_O_CACHE_WB, NULL);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +            backing_fmt = bdrv_get_format_name(backing_bs);
> +            bdrv_delete(backing_bs);
> +        }
> +    } else {
> +        header.features |= ADD_COW_F_All_ALLOCATED;
> +    }
> +
> +    if (image_filename) {
> +        header.image_filename_offset =
> +            sizeof(header) + sizeof(s.backing_file_format)
> +                + sizeof(s.image_file_format) + header.backing_filename_size;
> +        header.image_filename_size = strlen(image_filename);
> +    } else {
> +        error_report("Error: image_file should be given.");
> +        return -EINVAL;
> +    }
> +
> +    if (backing_filename && !strcmp(backing_filename, image_filename)) {
> +        error_report("Error: Trying to create an image with the "
> +                     "same backing file name as the image file name");
> +        return -EINVAL;
> +    }
> +
> +    if (!strcmp(filename, image_filename)) {
> +        error_report("Error: Trying to create an image with the "
> +                     "same filename as the image file name");
> +        return -EINVAL;
> +    }
> +
> +    if (header.image_filename_offset + header.image_filename_size
> +            > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
> +        error_report("image_file name or backing_file name too long.");
> +        return -ENOSPC;
> +    }
> +
> +    ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    bdrv_delete(image_bs);
> +
> +    ret = bdrv_create_file(filename, NULL);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +
> +    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    add_cow_header_cpu_to_le(&header, &le_header);
> +    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
> +        backing_fmt ? strlen(backing_fmt) : 0);

The spec requires zero padding, which you don't do here.

> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
> +        image_format ? image_format : "raw",
> +        image_format ? strlen(image_format) : sizeof("raw"));

And here.

> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    if (backing_filename) {
> +        ret = bdrv_pwrite(bs, header.backing_filename_offset,
> +            backing_filename, header.backing_filename_size);
> +        if (ret < 0) {
> +            bdrv_delete(bs);
> +            return ret;
> +        }
> +    }
> +
> +    ret = bdrv_pwrite(bs, header.image_filename_offset,
> +        image_filename, header.image_filename_size);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
> +    if (ret < 0) {
> +        bdrv_delete(bs);
> +        return ret;
> +    }
> +
> +    ret = bdrv_truncate(bs, image_len);
> +    bdrv_delete(bs);
> +    return ret;
> +}
> +
> +static int add_cow_open(BlockDriverState *bs, int flags)
> +{
> +    char                image_filename[ADD_COW_FILE_LEN];
> +    char                tmp_name[ADD_COW_FILE_LEN];
> +    BlockDriver         *image_drv = NULL;
> +    int                 ret;
> +    int                 sector_per_byte;
> +    BDRVAddCowState     *s = bs->opaque;
> +    AddCowHeader        le_header;
> +
> +    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
> +    if (ret != sizeof(s->header)) {

if (ret < 0) would be more consistent with the rest of the code.

> +        goto fail;
> +    }
> +
> +    add_cow_header_le_to_cpu(&le_header, &s->header);
> +
> +    if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {

Isn't this one endianess conversion too much? s->header is already LE.

Did you test add-cow on a big endian host?

> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +
> +    if (s->header.version != ADD_COW_VERSION) {
> +        char version[64];
> +        snprintf(version, sizeof(version), "ADD-COW version %d",
> +            s->header.version);
> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +            bs->device_name, "add-cow", version);
> +        ret = -ENOTSUP;
> +        goto fail;
> +    }
> +
> +    if (s->header.features & ~ADD_COW_FEATURE_MASK) {
> +        char buf[64];
> +        snprintf(buf, sizeof(buf), "%" PRIx64,
> +            s->header.features & ~ADD_COW_FEATURE_MASK);

This message is a bit terse, most users will be confused with an error
message that only consists of a hex number. Maybe better "Feature flags:
%" PRIx64.

> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +            bs->device_name, "add-cow", buf);
> +        return -ENOTSUP;
> +    }
> +
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        ret = bdrv_read_string(bs->file, sizeof(s->header),
> +            sizeof(s->backing_file_format) - 1, s->backing_file_format,
> +            sizeof(s->backing_file_format));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }

Would be great if this was not only read into memory, but actually
used... It must end up in bs->backing_format in order take effect.

> +
> +    ret = bdrv_read_string(bs->file,
> +            sizeof(s->header) + sizeof(s->image_file_format),
> +            sizeof(s->image_file_format) - 1, s->image_file_format,
> +            sizeof(s->image_file_format));
> +    if (ret < 0) {
> +        goto fail;
> +    }

This one is unused, too.

> +
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
> +                          s->header.backing_filename_size, bs->backing_file,
> +                          sizeof(bs->backing_file));
> +        if (ret < 0) {
> +            goto fail;
> +        }
> +    }
> +
> +    ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
> +                      s->header.image_filename_size, tmp_name,
> +                      sizeof(tmp_name));
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +
> +    s->image_hd = bdrv_new("");
> +    if (path_has_protocol(image_filename)) {
> +        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
> +    } else {
> +        path_combine(image_filename, sizeof(image_filename),
> +                     bs->filename, tmp_name);
> +    }
> +
> +    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);

image_drv is always NULL.

> +    if (ret < 0) {
> +        bdrv_delete(s->image_hd);
> +        goto fail;
> +    }
> +
> +    bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
> +    s->cluster_size = ADD_COW_CLUSTER_SIZE;
> +    sector_per_byte = SECTORS_PER_CLUSTER * 8;
> +    s->bitmap_size =
> +        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
> +    s->bitmap_cache =
> +        block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
> +
> +    qemu_co_mutex_init(&s->lock);
> +    return 0;
> +fail:
> +    if (s->bitmap_cache) {
> +        block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> +    }
> +    return ret;
> +}
> +
> +static void add_cow_close(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> +    bdrv_delete(s->image_hd);
> +}
> +
> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
> +{
> +    BDRVAddCowState *s  = bs->opaque;
> +    BlockCache *c = s->bitmap_cache;
> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> +    uint8_t *table      = NULL;
> +    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> +        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
> +    int ret = block_cache_get(bs, s->bitmap_cache, offset,
> +        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);

No matching block_cache_put?

> +
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
> +        & (1 << (cluster_num % 8));
> +}
> +
> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
> +        int64_t sector_num, int nb_sectors, int *num_same)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int changed;
> +
> +    if (nb_sectors == 0) {
> +        *num_same = 0;
> +        return 0;
> +    }
> +
> +    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
> +        *num_same = nb_sectors - 1;

Why - 1?

> +        return 1;
> +    }
> +    changed = is_allocated(bs, sector_num);
> +
> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
> +            break;
> +        }
> +    }
> +    return changed;
> +}
> +
> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
> +                  int64_t sector_num, int nb_sectors)
> +{
> +    int n1;
> +    if ((sector_num + nb_sectors) <= bs->total_sectors) {
> +        return nb_sectors;
> +    }
> +    if (sector_num >= bs->total_sectors) {
> +        n1 = 0;
> +    } else {
> +        n1 = bs->total_sectors - sector_num;
> +    }
> +
> +    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
> +        0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
> +
> +    return n1;
> +}
> +
> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
> +    int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s  = bs->opaque;
> +    int cur_nr_sectors;
> +    uint64_t bytes_done = 0;
> +    QEMUIOVector hd_qiov;
> +    int n, n1, ret = 0;
> +
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    qemu_co_mutex_lock(&s->lock);
> +    while (remaining_sectors != 0) {
> +        cur_nr_sectors = remaining_sectors;
> +        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
> +            cur_nr_sectors = n;

One of n and cur_nr_sectors is redundant.

> +            qemu_iovec_reset(&hd_qiov);
> +            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
> +            qemu_co_mutex_unlock(&s->lock);
> +            ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
> +            qemu_co_mutex_lock(&s->lock);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        } else {
> +            cur_nr_sectors = n;
> +            if (bs->backing_hd) {
> +                qemu_iovec_reset(&hd_qiov);
> +                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
> +                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
> +                    sector_num, cur_nr_sectors);
> +                if (n1 > 0) {
> +                    qemu_co_mutex_unlock(&s->lock);
> +                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
> +                                        n, &hd_qiov);
> +                    qemu_co_mutex_lock(&s->lock);
> +                    if (ret < 0) {
> +                        goto fail;
> +                    }
> +                }
> +            } else {
> +                qemu_iovec_memset(&hd_qiov, 0, 0,
> +                    BDRV_SECTOR_SIZE * cur_nr_sectors);
> +            }
> +        }
> +        remaining_sectors -= cur_nr_sectors;
> +        sector_num += cur_nr_sectors;
> +        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
> +    }
> +fail:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_iovec_destroy(&hd_qiov);
> +    return ret;
> +}
> +
> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
> +                                     int n_start, int n_end)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    QEMUIOVector qiov;
> +    struct iovec iov;
> +    int n, ret;
> +
> +    n = n_end - n_start;
> +    if (n <= 0) {
> +        return 0;
> +    }
> +
> +    iov.iov_len = n * BDRV_SECTOR_SIZE;
> +    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
> +
> +    qemu_iovec_init_external(&qiov, &iov, 1);
> +
> +    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
> +    if (ret < 0) {
> +        goto out;
> +    }
> +
> +    ret = 0;
> +out:
> +    qemu_vfree(iov.iov_base);
> +    return ret;
> +}
> +
> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
> +        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    BlockCache *c = s->bitmap_cache;
> +    int ret = 0, i;
> +    QEMUIOVector hd_qiov;
> +    uint8_t *table;
> +    uint64_t offset;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    qemu_iovec_init(&hd_qiov, qiov->niov);
> +    ret = bdrv_co_writev(s->image_hd,
> +                     sector_num,
> +                     remaining_sectors, qiov);
> +
> +    if (ret < 0) {
> +        goto fail;
> +    }
> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> +        /* Copy content of unmodified sectors */
> +        if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
> +            ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
> +                sector_num);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1)
> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
> +                ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +        }
> +
> +        for (i = sector_num / SECTORS_PER_CLUSTER;
> +            i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
> +            i++) {
> +            offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> +                + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));

The maths in this loop looks a bit confusing, but I think it's correct.

> +            ret = block_cache_get(bs, s->bitmap_cache, offset,
> +                (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
> +            if (ret < 0) {
> +                goto fail;
> +            }
> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
> +                table[i / 8] |= (1 << (i % 8));
> +                block_cache_entry_mark_dirty(s->bitmap_cache, table);
> +            }

Missing block_cache_put again?

> +        }
> +    }
> +    ret = 0;
> +fail:
> +    qemu_co_mutex_unlock(&s->lock);
> +    qemu_iovec_destroy(&hd_qiov);
> +    return ret;
> +}
> +
> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
> +    int ret;
> +    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
> +    int64_t bitmap_size =
> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
> +    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
> +        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
> +
> +    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
> +    if (ret < 0) {
> +        return ret;
> +    }
> +    return 0;
> +}

So you don't truncate s->image_file? Does this work?

> +
> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
> +{
> +    BDRVAddCowState *s = bs->opaque;
> +    int ret;
> +
> +    qemu_co_mutex_lock(&s->lock);
> +    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
> +        ADD_COW_CACHE_ENTRY_SIZE);
> +    qemu_co_mutex_unlock(&s->lock);
> +    return ret;
> +}

What about flushing s->image_file?

> +
> +static QEMUOptionParameter add_cow_create_options[] = {
> +    {
> +        .name = BLOCK_OPT_SIZE,
> +        .type = OPT_SIZE,
> +        .help = "Virtual disk size"
> +    },
> +    {
> +        .name = BLOCK_OPT_BACKING_FILE,
> +        .type = OPT_STRING,
> +        .help = "File name of a base image"
> +    },
> +    {
> +        .name = BLOCK_OPT_BACKING_FMT,
> +        .type = OPT_STRING,
> +        .help = "Image format of the base image"
> +    },
> +    {
> +        .name = BLOCK_OPT_IMAGE_FILE,
> +        .type = OPT_STRING,
> +        .help = "File name of a image file"
> +    },
> +    {
> +        .name = BLOCK_OPT_IMAGE_FORMAT,
> +        .type = OPT_STRING,
> +        .help = "Image format of the image file"
> +    },
> +    { NULL }
> +};
> +
> +static BlockDriver bdrv_add_cow = {
> +    .format_name                = "add-cow",
> +    .instance_size              = sizeof(BDRVAddCowState),
> +    .bdrv_probe                 = add_cow_probe,
> +    .bdrv_open                  = add_cow_open,
> +    .bdrv_close                 = add_cow_close,
> +    .bdrv_create                = add_cow_create,
> +    .bdrv_co_readv              = add_cow_co_readv,
> +    .bdrv_co_writev             = add_cow_co_writev,
> +    .bdrv_truncate              = bdrv_add_cow_truncate,
> +    .bdrv_co_is_allocated       = add_cow_is_allocated,
> +
> +    .create_options             = add_cow_create_options,
> +    .bdrv_co_flush_to_os        = add_cow_co_flush,
> +};
> +
> +static void bdrv_add_cow_init(void)
> +{
> +    bdrv_register(&bdrv_add_cow);
> +}
> +
> +block_init(bdrv_add_cow_init);
> diff --git a/block/add-cow.h b/block/add-cow.h
> new file mode 100644
> index 0000000..f058376
> --- /dev/null
> +++ b/block/add-cow.h
> @@ -0,0 +1,85 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#ifndef BLOCK_ADD_COW_H
> +#define BLOCK_ADD_COW_H
> +#include "block-cache.h"
> +
> +enum {
> +    ADD_COW_F_All_ALLOCATED     = 0X01,
> +    ADD_COW_FEATURE_MASK        = ADD_COW_F_All_ALLOCATED,
> +
> +    ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
> +                    ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
> +                    ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
> +                    ((uint64_t)'W' << 8) | 0xFF),
> +    ADD_COW_VERSION             = 1,
> +    ADD_COW_FILE_LEN            = 1024,
> +    ADD_COW_CACHE_SIZE          = 16,
> +    ADD_COW_CACHE_ENTRY_SIZE    = 65536,
> +    ADD_COW_CLUSTER_SIZE        = 65536,
> +    SECTORS_PER_CLUSTER         = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
> +    ADD_COW_PAGE_SIZE           = 4096,
> +    ADD_COW_DEFAULT_PAGE_SIZE   = 1,
> +};
> +
> +typedef struct AddCowHeader {
> +    uint64_t        magic;
> +    uint32_t        version;
> +
> +    uint32_t        backing_filename_offset;
> +    uint32_t        backing_filename_size;
> +
> +    uint32_t        image_filename_offset;
> +    uint32_t        image_filename_size;
> +
> +    uint64_t        features;
> +    uint64_t        optional_features;
> +    uint32_t        header_pages_size;
> +} QEMU_PACKED AddCowHeader;

Why aren't backing/image_file_format part of the header here? They are
in the spec. It would also simplify some offset calculation code.

> +
> +typedef struct BDRVAddCowState {
> +    BlockDriverState    *image_hd;
> +    CoMutex             lock;
> +    int                 cluster_size;
> +    BlockCache         *bitmap_cache;
> +    uint64_t            bitmap_size;
> +    AddCowHeader        header;
> +    char                backing_file_format[16];
> +    char                image_file_format[16];
> +} BDRVAddCowState;
> +
> +/* Convert sector_num to offset in bitmap */
> +static inline int64_t offset_in_bitmap(int64_t sector_num)
> +{
> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> +    return cluster_num / 8;
> +}
> +
> +static inline bool is_cluster_head(int64_t sector_num)
> +{
> +    return sector_num % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +static inline bool is_cluster_tail(int64_t sector_num)
> +{
> +    return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
> +    void **table);
> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);

These functions don't really exist any more, do they?

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-09-10  2:25     ` Dong Xu Wang
@ 2012-09-11  9:44       ` Kevin Wolf
  0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11  9:44 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: Michael Roth, qemu-devel

Am 10.09.2012 04:25, schrieb Dong Xu Wang:
> On Fri, Sep 7, 2012 at 4:19 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
>> On Fri, Aug 10, 2012 at 11:39:44PM +0800, Dong Xu Wang wrote:
>>> +typedef struct AddCowHeader {
>>> +    uint64_t        magic;
>>> +    uint32_t        version;
>>> +
>>> +    uint32_t        backing_filename_offset;
>>> +    uint32_t        backing_filename_size;
>>> +
>>> +    uint32_t        image_filename_offset;
>>> +    uint32_t        image_filename_size;
>>> +
>>> +    uint64_t        features;
>>> +    uint64_t        optional_features;
>>> +    uint32_t        header_pages_size;
>>> +} QEMU_PACKED AddCowHeader;
>>
>> You should avoid using packed structures for image format headers.
>> Instead, I would either:
>>
>> a) re-order the fields so that 32/64-bit fields, respectively, fall on
>> 32/64-bit boundaries (in your case, for instance, moving header_pages_size
>> above features) like qed/qcow2 do, or
>>
>> b) read/write the fields individually rather than reading/writing directly
>> into/from the header struct.
>>
>> The safest route is b). Adds a few lines of code, but you won't have to
>> re-work things (or worry about introducing bugs) later if you were to add,
>> say, a 32-bit value, and then a 64-bit value later.
> 
> While, Kevin's suggestion is using PACKED, so ..

Yes, I think QEMU_PACKED is fine, and it's the safest version.

It would be nice to additionally do Michael's option a) if you like, but
I don't think the header is accessed too often, so the optimisation
isn't that important.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support
  2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
@ 2012-09-11  9:55   ` Kevin Wolf
  0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11  9:55 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: qemu-devel

Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> Add qemu-iotests support for add-cow.
> 
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
>  tests/qemu-iotests/017       |    2 +-
>  tests/qemu-iotests/020       |    2 +-
>  tests/qemu-iotests/check     |    4 ++--
>  tests/qemu-iotests/common    |    6 ++++++
>  tests/qemu-iotests/common.rc |   19 +++++++++++++++++++
>  5 files changed, 29 insertions(+), 4 deletions(-)

> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> index 432732c..122267b 100755
> --- a/tests/qemu-iotests/check
> +++ b/tests/qemu-iotests/check
> @@ -243,7 +243,7 @@ do
>  		echo " - no qualified output"
>  		err=true
>  	    else
> -		if diff -w $seq.out $tmp.out >/dev/null 2>&1
> +        if diff -w -I "^Formatting" $seq.out $tmp.out >/dev/null 2>&1
>  		then
>  		    echo ""
>  		    if $err
> @@ -255,7 +255,7 @@ do
>  		else
>  		    echo " - output mismatch (see $seq.out.bad)"
>  		    mv $tmp.out $seq.out.bad
> -		    $diff -w $seq.out $seq.out.bad
> +            $diff -w -I "^Formatting" $seq.out $seq.out.bad
>  		    err=true
>  		fi
>  	    fi

These two hunks don't look right. You probably want to amend the sed
command in _make_test_img().

> diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
> index 7782808..ec5afd7 100644
> --- a/tests/qemu-iotests/common.rc
> +++ b/tests/qemu-iotests/common.rc
> @@ -97,6 +97,18 @@ _make_test_img()
>      fi
>      if [ \( "$IMGFMT" = "qcow2" -o "$IMGFMT" = "qed" \) -a -n "$CLUSTER_SIZE" ]; then
>          optstr=$(_optstr_add "$optstr" "cluster_size=$CLUSTER_SIZE")
> +    elif [ "$IMGFMT" = "add-cow" ]; then
> +        local BACKING="$TEST_IMG"".qcow2"
> +        local IMG="$TEST_IMG"".raw"
> +        if [ "$1" = "-b" ]; then
> +            IMG="$IMG"".b"
> +            $QEMU_IMG create -f raw $IMG $image_size>/dev/null
> +            extra_img_options="-o image_file=$IMG $extra_img_options"
> +        else
> +            $QEMU_IMG create -f raw $IMG $image_size>/dev/null
> +            $QEMU_IMG create -f qcow2 $BACKING $image_size>/dev/null
> +            extra_img_options="-o backing_file=$BACKING,image_file=$IMG"
> +        fi

This looks a bit hackish... Doesn't it completely ignore the requested
backing file name? I'm not sure if this is a good idea.

Can't you just create the raw image file and then use _optstr_add to add
the right -o image_file=... option? It should automatically get the
backing file right.

>      fi
>  
>      if [ -n "$optstr" ]; then
> @@ -125,6 +137,13 @@ _cleanup_test_img()
>              rm -f $TEST_DIR/t.$IMGFMT
>              rm -f $TEST_DIR/t.$IMGFMT.orig
>              rm -f $TEST_DIR/t.$IMGFMT.base
> +            if [ "$IMGFMT" = "add-cow" ]; then
> +                rm -f $TEST_DIR/t.$IMGFMT.qcow2
> +                rm -f $TEST_DIR/t.$IMGFMT.raw
> +                rm -f $TEST_DIR/t.$IMGFMT.raw.b
> +                rm -f $TEST_DIR/t.$IMGFMT.ct.qcow2
> +                rm -f $TEST_DIR/t.$IMGFMT.ct.raw

What are the .ct files?

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-09-11  9:40   ` Kevin Wolf
@ 2012-09-12  7:28     ` Dong Xu Wang
  2012-09-12  7:50       ` Kevin Wolf
  0 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-12  7:28 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel

On Tue, Sep 11, 2012 at 5:40 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 10.08.2012 17:39, schrieb Dong Xu Wang:
>> add-cow file format core code. It use block-cache.c as cache code.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>>  block/Makefile.objs |    1 +
>>  block/add-cow.c     |  613 +++++++++++++++++++++++++++++++++++++++++++++++++++
>>  block/add-cow.h     |   85 +++++++
>>  block_int.h         |    2 +
>>  4 files changed, 701 insertions(+), 0 deletions(-)
>>  create mode 100644 block/add-cow.c
>>  create mode 100644 block/add-cow.h
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 23bdfc8..7ed5051 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>>  block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>>  block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>>  block-obj-y += qed-check.o
>> +block-obj-y += add-cow.o
>>  block-obj-y += block-cache.o
>>  block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>>  block-obj-y += stream.o
>> diff --git a/block/add-cow.c b/block/add-cow.c
>> new file mode 100644
>> index 0000000..d4711d5
>> --- /dev/null
>> +++ b/block/add-cow.c
>> @@ -0,0 +1,613 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "block_int.h"
>> +#include "module.h"
>> +#include "add-cow.h"
>> +
>> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
>> +{
>> +    cpu->magic                      = le64_to_cpu(le->magic);
>> +    cpu->version                    = le32_to_cpu(le->version);
>> +
>> +    cpu->backing_filename_offset    = le32_to_cpu(le->backing_filename_offset);
>> +    cpu->backing_filename_size      = le32_to_cpu(le->backing_filename_size);
>> +
>> +    cpu->image_filename_offset      = le32_to_cpu(le->image_filename_offset);
>> +    cpu->image_filename_size        = le32_to_cpu(le->image_filename_size);
>> +
>> +    cpu->features                   = le64_to_cpu(le->features);
>> +    cpu->optional_features          = le64_to_cpu(le->optional_features);
>> +    cpu->header_pages_size          = le32_to_cpu(le->header_pages_size);
>> +}
>> +
>> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
>> +{
>> +    le->magic                       = cpu_to_le64(cpu->magic);
>> +    le->version                     = cpu_to_le32(cpu->version);
>> +
>> +    le->backing_filename_offset     = cpu_to_le32(cpu->backing_filename_offset);
>> +    le->backing_filename_size       = cpu_to_le32(cpu->backing_filename_size);
>> +
>> +    le->image_filename_offset       = cpu_to_le32(cpu->image_filename_offset);
>> +    le->image_filename_size         = cpu_to_le32(cpu->image_filename_size);
>> +
>> +    le->features                    = cpu_to_le64(cpu->features);
>> +    le->optional_features           = cpu_to_le64(cpu->optional_features);
>> +    le->header_pages_size           = cpu_to_le32(cpu->header_pages_size);
>> +}
>> +
>> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
>> +{
>> +    const AddCowHeader *header = (const AddCowHeader *)buf;
>> +
>> +    if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
>> +        le32_to_cpu(header->version) == ADD_COW_VERSION) {
>> +        return 100;
>> +    } else {
>> +        return 0;
>> +    }
>> +}
>> +
>> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
>> +{
>> +    AddCowHeader header = {
>> +        .magic = ADD_COW_MAGIC,
>> +        .version = ADD_COW_VERSION,
>> +        .features = 0,
>> +        .optional_features = 0,
>> +        .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
>> +    };
>> +    AddCowHeader le_header;
>> +    int64_t image_len = 0;
>> +    const char *backing_filename = NULL;
>> +    const char *backing_fmt = NULL;
>> +    const char *image_filename = NULL;
>> +    const char *image_format = NULL;
>> +    BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
>> +    BlockDriver *drv = bdrv_find_format("add-cow");
>> +    BDRVAddCowState s;
>> +    int ret;
>> +
>> +    while (options && options->name) {
>> +        if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
>> +            image_len = options->value.n;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
>> +            backing_filename = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
>> +            backing_fmt = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
>> +            image_filename = options->value.s;
>> +        } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
>> +            image_format = options->value.s;
>> +        }
>> +        options++;
>> +    }
>> +
>> +    if (backing_filename) {
>> +        header.backing_filename_offset = sizeof(header)
>> +            + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
>> +        header.backing_filename_size = strlen(backing_filename);
>> +
>> +        if (!backing_fmt) {
>> +            backing_bs = bdrv_new("image");
>> +            ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
>> +                    | BDRV_O_CACHE_WB, NULL);
>> +            if (ret < 0) {
>> +                return ret;
>> +            }
>> +            backing_fmt = bdrv_get_format_name(backing_bs);
>> +            bdrv_delete(backing_bs);
>> +        }
>> +    } else {
>> +        header.features |= ADD_COW_F_All_ALLOCATED;
>> +    }
>> +
>> +    if (image_filename) {
>> +        header.image_filename_offset =
>> +            sizeof(header) + sizeof(s.backing_file_format)
>> +                + sizeof(s.image_file_format) + header.backing_filename_size;
>> +        header.image_filename_size = strlen(image_filename);
>> +    } else {
>> +        error_report("Error: image_file should be given.");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (backing_filename && !strcmp(backing_filename, image_filename)) {
>> +        error_report("Error: Trying to create an image with the "
>> +                     "same backing file name as the image file name");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (!strcmp(filename, image_filename)) {
>> +        error_report("Error: Trying to create an image with the "
>> +                     "same filename as the image file name");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (header.image_filename_offset + header.image_filename_size
>> +            > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
>> +        error_report("image_file name or backing_file name too long.");
>> +        return -ENOSPC;
>> +    }
>> +
>> +    ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    bdrv_delete(image_bs);
>> +
>> +    ret = bdrv_create_file(filename, NULL);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    add_cow_header_cpu_to_le(&header, &le_header);
>> +    ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
>> +        backing_fmt ? strlen(backing_fmt) : 0);
>
> The spec requires zero padding, which you don't do here.
Okay.
>
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
>> +        image_format ? image_format : "raw",
>> +        image_format ? strlen(image_format) : sizeof("raw"));
>
> And here.

Okay.

>
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    if (backing_filename) {
>> +        ret = bdrv_pwrite(bs, header.backing_filename_offset,
>> +            backing_filename, header.backing_filename_size);
>> +        if (ret < 0) {
>> +            bdrv_delete(bs);
>> +            return ret;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_pwrite(bs, header.image_filename_offset,
>> +        image_filename, header.image_filename_size);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
>> +    if (ret < 0) {
>> +        bdrv_delete(bs);
>> +        return ret;
>> +    }
>> +
>> +    ret = bdrv_truncate(bs, image_len);
>> +    bdrv_delete(bs);
>> +    return ret;
>> +}
>> +
>> +static int add_cow_open(BlockDriverState *bs, int flags)
>> +{
>> +    char                image_filename[ADD_COW_FILE_LEN];
>> +    char                tmp_name[ADD_COW_FILE_LEN];
>> +    BlockDriver         *image_drv = NULL;
>> +    int                 ret;
>> +    int                 sector_per_byte;
>> +    BDRVAddCowState     *s = bs->opaque;
>> +    AddCowHeader        le_header;
>> +
>> +    ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
>> +    if (ret != sizeof(s->header)) {
>
> if (ret < 0) would be more consistent with the rest of the code.
>

Okay.

>> +        goto fail;
>> +    }
>> +
>> +    add_cow_header_le_to_cpu(&le_header, &s->header);
>> +
>> +    if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
>
> Isn't this one endianess conversion too much? s->header is already LE.
>
> Did you test add-cow on a big endian host?

My fault, will correct it in next version.

>
>> +        ret = -EINVAL;
>> +        goto fail;
>> +    }
>> +
>> +    if (s->header.version != ADD_COW_VERSION) {
>> +        char version[64];
>> +        snprintf(version, sizeof(version), "ADD-COW version %d",
>> +            s->header.version);
>> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> +            bs->device_name, "add-cow", version);
>> +        ret = -ENOTSUP;
>> +        goto fail;
>> +    }
>> +
>> +    if (s->header.features & ~ADD_COW_FEATURE_MASK) {
>> +        char buf[64];
>> +        snprintf(buf, sizeof(buf), "%" PRIx64,
>> +            s->header.features & ~ADD_COW_FEATURE_MASK);
>
> This message is a bit terse, most users will be confused with an error
> message that only consists of a hex number. Maybe better "Feature flags:
> %" PRIx64.
>

Okay.

>> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> +            bs->device_name, "add-cow", buf);
>> +        return -ENOTSUP;
>> +    }
>> +
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        ret = bdrv_read_string(bs->file, sizeof(s->header),
>> +            sizeof(s->backing_file_format) - 1, s->backing_file_format,
>> +            sizeof(s->backing_file_format));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    }
>
> Would be great if this was not only read into memory, but actually
> used... It must end up in bs->backing_format in order take effect.
>
>> +
>> +    ret = bdrv_read_string(bs->file,
>> +            sizeof(s->header) + sizeof(s->image_file_format),
>> +            sizeof(s->image_file_format) - 1, s->image_file_format,
>> +            sizeof(s->image_file_format));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>
> This one is unused, too.
>
Okay.

>> +
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>> +                          s->header.backing_filename_size, bs->backing_file,
>> +                          sizeof(bs->backing_file));
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    }
>> +
>> +    ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
>> +                      s->header.image_filename_size, tmp_name,
>> +                      sizeof(tmp_name));
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +
>> +    s->image_hd = bdrv_new("");
>> +    if (path_has_protocol(image_filename)) {
>> +        pstrcpy(image_filename, sizeof(image_filename), tmp_name);
>> +    } else {
>> +        path_combine(image_filename, sizeof(image_filename),
>> +                     bs->filename, tmp_name);
>> +    }
>> +
>> +    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
>
> image_drv is always NULL.
>
>> +    if (ret < 0) {
>> +        bdrv_delete(s->image_hd);
>> +        goto fail;
>> +    }
>> +
>> +    bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
>> +    s->cluster_size = ADD_COW_CLUSTER_SIZE;
>> +    sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> +    s->bitmap_size =
>> +        (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
>> +    s->bitmap_cache =
>> +        block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
>> +
>> +    qemu_co_mutex_init(&s->lock);
>> +    return 0;
>> +fail:
>> +    if (s->bitmap_cache) {
>> +        block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> +    }
>> +    return ret;
>> +}
>> +
>> +static void add_cow_close(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> +    bdrv_delete(s->image_hd);
>> +}
>> +
>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>> +{
>> +    BDRVAddCowState *s  = bs->opaque;
>> +    BlockCache *c = s->bitmap_cache;
>> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> +    uint8_t *table      = NULL;
>> +    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> +        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
>> +    int ret = block_cache_get(bs, s->bitmap_cache, offset,
>> +        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>
> No matching block_cache_put?
>
>> +
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
>> +        & (1 << (cluster_num % 8));
>> +}
>> +
>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>> +        int64_t sector_num, int nb_sectors, int *num_same)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int changed;
>> +
>> +    if (nb_sectors == 0) {
>> +        *num_same = 0;
>> +        return 0;
>> +    }
>> +
>> +    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
>> +        *num_same = nb_sectors - 1;
>
> Why - 1?
>
>> +        return 1;
>> +    }
>> +    changed = is_allocated(bs, sector_num);
>> +
>> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
>> +            break;
>> +        }
>> +    }
>> +    return changed;
>> +}
>> +
>> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
>> +                  int64_t sector_num, int nb_sectors)
>> +{
>> +    int n1;
>> +    if ((sector_num + nb_sectors) <= bs->total_sectors) {
>> +        return nb_sectors;
>> +    }
>> +    if (sector_num >= bs->total_sectors) {
>> +        n1 = 0;
>> +    } else {
>> +        n1 = bs->total_sectors - sector_num;
>> +    }
>> +
>> +    qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
>> +        0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
>> +
>> +    return n1;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
>> +    int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s  = bs->opaque;
>> +    int cur_nr_sectors;
>> +    uint64_t bytes_done = 0;
>> +    QEMUIOVector hd_qiov;
>> +    int n, n1, ret = 0;
>> +
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    qemu_co_mutex_lock(&s->lock);
>> +    while (remaining_sectors != 0) {
>> +        cur_nr_sectors = remaining_sectors;
>> +        if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
>> +            cur_nr_sectors = n;
>
> One of n and cur_nr_sectors is redundant.
Okay.
>
>> +            qemu_iovec_reset(&hd_qiov);
>> +            qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
>> +            qemu_co_mutex_unlock(&s->lock);
>> +            ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
>> +            qemu_co_mutex_lock(&s->lock);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        } else {
>> +            cur_nr_sectors = n;
>> +            if (bs->backing_hd) {
>> +                qemu_iovec_reset(&hd_qiov);
>> +                qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> +                            cur_nr_sectors * BDRV_SECTOR_SIZE);
>> +                n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
>> +                    sector_num, cur_nr_sectors);
>> +                if (n1 > 0) {
>> +                    qemu_co_mutex_unlock(&s->lock);
>> +                    ret = bdrv_co_readv(bs->backing_hd, sector_num,
>> +                                        n, &hd_qiov);
>> +                    qemu_co_mutex_lock(&s->lock);
>> +                    if (ret < 0) {
>> +                        goto fail;
>> +                    }
>> +                }
>> +            } else {
>> +                qemu_iovec_memset(&hd_qiov, 0, 0,
>> +                    BDRV_SECTOR_SIZE * cur_nr_sectors);
>> +            }
>> +        }
>> +        remaining_sectors -= cur_nr_sectors;
>> +        sector_num += cur_nr_sectors;
>> +        bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
>> +    }
>> +fail:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_iovec_destroy(&hd_qiov);
>> +    return ret;
>> +}
>> +
>> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
>> +                                     int n_start, int n_end)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    QEMUIOVector qiov;
>> +    struct iovec iov;
>> +    int n, ret;
>> +
>> +    n = n_end - n_start;
>> +    if (n <= 0) {
>> +        return 0;
>> +    }
>> +
>> +    iov.iov_len = n * BDRV_SECTOR_SIZE;
>> +    iov.iov_base = qemu_blockalign(bs, iov.iov_len);
>> +
>> +    qemu_iovec_init_external(&qiov, &iov, 1);
>> +
>> +    ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +    ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
>> +    if (ret < 0) {
>> +        goto out;
>> +    }
>> +
>> +    ret = 0;
>> +out:
>> +    qemu_vfree(iov.iov_base);
>> +    return ret;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
>> +        int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    BlockCache *c = s->bitmap_cache;
>> +    int ret = 0, i;
>> +    QEMUIOVector hd_qiov;
>> +    uint8_t *table;
>> +    uint64_t offset;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    qemu_iovec_init(&hd_qiov, qiov->niov);
>> +    ret = bdrv_co_writev(s->image_hd,
>> +                     sector_num,
>> +                     remaining_sectors, qiov);
>> +
>> +    if (ret < 0) {
>> +        goto fail;
>> +    }
>> +    if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> +        /* Copy content of unmodified sectors */
>> +        if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
>> +            ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
>> +                sector_num);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        if (!is_cluster_tail(sector_num + remaining_sectors - 1)
>> +            && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
>> +            ret = copy_sectors(bs, sector_num + remaining_sectors,
>> +                ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        for (i = sector_num / SECTORS_PER_CLUSTER;
>> +            i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
>> +            i++) {
>> +            offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> +                + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
>
> The maths in this loop looks a bit confusing, but I think it's correct.
>
>> +            ret = block_cache_get(bs, s->bitmap_cache, offset,
>> +                (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>> +            if (ret < 0) {
>> +                goto fail;
>> +            }
>> +            if ((table[i / 8] & (1 << (i % 8))) == 0) {
>> +                table[i / 8] |= (1 << (i % 8));
>> +                block_cache_entry_mark_dirty(s->bitmap_cache, table);
>> +            }
>
> Missing block_cache_put again?
>
>> +        }
>> +    }
>> +    ret = 0;
>> +fail:
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    qemu_iovec_destroy(&hd_qiov);
>> +    return ret;
>> +}
>> +
>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> +    int ret;
>> +    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
>> +    int64_t bitmap_size =
>> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>> +    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
>> +        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
>> +
>> +    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
>> +    if (ret < 0) {
>> +        return ret;
>> +    }
>> +    return 0;
>> +}
>
> So you don't truncate s->image_file? Does this work?

s->image_file should be truncated? Image file can have a larger virtual size
than backing_file, my understanding is we should not truncate image file.

>
>> +
>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>> +{
>> +    BDRVAddCowState *s = bs->opaque;
>> +    int ret;
>> +
>> +    qemu_co_mutex_lock(&s->lock);
>> +    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
>> +        ADD_COW_CACHE_ENTRY_SIZE);
>> +    qemu_co_mutex_unlock(&s->lock);
>> +    return ret;
>> +}
>
> What about flushing s->image_file?
>
>> +
>> +static QEMUOptionParameter add_cow_create_options[] = {
>> +    {
>> +        .name = BLOCK_OPT_SIZE,
>> +        .type = OPT_SIZE,
>> +        .help = "Virtual disk size"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_BACKING_FILE,
>> +        .type = OPT_STRING,
>> +        .help = "File name of a base image"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_BACKING_FMT,
>> +        .type = OPT_STRING,
>> +        .help = "Image format of the base image"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_IMAGE_FILE,
>> +        .type = OPT_STRING,
>> +        .help = "File name of a image file"
>> +    },
>> +    {
>> +        .name = BLOCK_OPT_IMAGE_FORMAT,
>> +        .type = OPT_STRING,
>> +        .help = "Image format of the image file"
>> +    },
>> +    { NULL }
>> +};
>> +
>> +static BlockDriver bdrv_add_cow = {
>> +    .format_name                = "add-cow",
>> +    .instance_size              = sizeof(BDRVAddCowState),
>> +    .bdrv_probe                 = add_cow_probe,
>> +    .bdrv_open                  = add_cow_open,
>> +    .bdrv_close                 = add_cow_close,
>> +    .bdrv_create                = add_cow_create,
>> +    .bdrv_co_readv              = add_cow_co_readv,
>> +    .bdrv_co_writev             = add_cow_co_writev,
>> +    .bdrv_truncate              = bdrv_add_cow_truncate,
>> +    .bdrv_co_is_allocated       = add_cow_is_allocated,
>> +
>> +    .create_options             = add_cow_create_options,
>> +    .bdrv_co_flush_to_os        = add_cow_co_flush,
>> +};
>> +
>> +static void bdrv_add_cow_init(void)
>> +{
>> +    bdrv_register(&bdrv_add_cow);
>> +}
>> +
>> +block_init(bdrv_add_cow_init);
>> diff --git a/block/add-cow.h b/block/add-cow.h
>> new file mode 100644
>> index 0000000..f058376
>> --- /dev/null
>> +++ b/block/add-cow.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + *  Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef BLOCK_ADD_COW_H
>> +#define BLOCK_ADD_COW_H
>> +#include "block-cache.h"
>> +
>> +enum {
>> +    ADD_COW_F_All_ALLOCATED     = 0X01,
>> +    ADD_COW_FEATURE_MASK        = ADD_COW_F_All_ALLOCATED,
>> +
>> +    ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
>> +                    ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
>> +                    ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
>> +                    ((uint64_t)'W' << 8) | 0xFF),
>> +    ADD_COW_VERSION             = 1,
>> +    ADD_COW_FILE_LEN            = 1024,
>> +    ADD_COW_CACHE_SIZE          = 16,
>> +    ADD_COW_CACHE_ENTRY_SIZE    = 65536,
>> +    ADD_COW_CLUSTER_SIZE        = 65536,
>> +    SECTORS_PER_CLUSTER         = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
>> +    ADD_COW_PAGE_SIZE           = 4096,
>> +    ADD_COW_DEFAULT_PAGE_SIZE   = 1,
>> +};
>> +
>> +typedef struct AddCowHeader {
>> +    uint64_t        magic;
>> +    uint32_t        version;
>> +
>> +    uint32_t        backing_filename_offset;
>> +    uint32_t        backing_filename_size;
>> +
>> +    uint32_t        image_filename_offset;
>> +    uint32_t        image_filename_size;
>> +
>> +    uint64_t        features;
>> +    uint64_t        optional_features;
>> +    uint32_t        header_pages_size;
>> +} QEMU_PACKED AddCowHeader;
>
> Why aren't backing/image_file_format part of the header here? They are
> in the spec. It would also simplify some offset calculation code.
>

Anthony said "It's far better to shrink the size of the header and use
an offset/len
pointer to the backing file string.  Limiting backing files to 1023 is
unacceptable"

http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg04110.html

So I use offset  and length instead of using string directly.

>> +
>> +typedef struct BDRVAddCowState {
>> +    BlockDriverState    *image_hd;
>> +    CoMutex             lock;
>> +    int                 cluster_size;
>> +    BlockCache         *bitmap_cache;
>> +    uint64_t            bitmap_size;
>> +    AddCowHeader        header;
>> +    char                backing_file_format[16];
>> +    char                image_file_format[16];
>> +} BDRVAddCowState;
>> +
>> +/* Convert sector_num to offset in bitmap */
>> +static inline int64_t offset_in_bitmap(int64_t sector_num)
>> +{
>> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> +    return cluster_num / 8;
>> +}
>> +
>> +static inline bool is_cluster_head(int64_t sector_num)
>> +{
>> +    return sector_num % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +static inline bool is_cluster_tail(int64_t sector_num)
>> +{
>> +    return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
>> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
>> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
>> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
>> +    void **table);
>> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
>
> These functions don't really exist any more, do they?

Right, sorry.

>
> Kevin
>

Thank you, Kevin.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
  2012-09-12  7:28     ` Dong Xu Wang
@ 2012-09-12  7:50       ` Kevin Wolf
  0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-12  7:50 UTC (permalink / raw)
  To: Dong Xu Wang; +Cc: qemu-devel

Am 12.09.2012 09:28, schrieb Dong Xu Wang:
>>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>>> +{
>>> +    BDRVAddCowState *s  = bs->opaque;
>>> +    BlockCache *c = s->bitmap_cache;
>>> +    int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>>> +    uint8_t *table      = NULL;
>>> +    uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>>> +        + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
>>> +    int ret = block_cache_get(bs, s->bitmap_cache, offset,
>>> +        (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>>
>> No matching block_cache_put?
>>
>>> +
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +    return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
>>> +        & (1 << (cluster_num % 8));
>>> +}
>>> +
>>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>>> +        int64_t sector_num, int nb_sectors, int *num_same)
>>> +{
>>> +    BDRVAddCowState *s = bs->opaque;
>>> +    int changed;
>>> +
>>> +    if (nb_sectors == 0) {
>>> +        *num_same = 0;
>>> +        return 0;
>>> +    }
>>> +
>>> +    if (s->header.features & ADD_COW_F_All_ALLOCATED) {
>>> +        *num_same = nb_sectors - 1;
>>
>> Why - 1?
>>
>>> +        return 1;
>>> +    }
>>> +    changed = is_allocated(bs, sector_num);
>>> +
>>> +    for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>>> +        if (is_allocated(bs, sector_num + *num_same) != changed) {
>>> +            break;
>>> +        }
>>> +    }
>>> +    return changed;
>>> +}

>>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>>> +{
>>> +    BDRVAddCowState *s = bs->opaque;
>>> +    int sector_per_byte = SECTORS_PER_CLUSTER * 8;
>>> +    int ret;
>>> +    uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
>>> +    int64_t bitmap_size =
>>> +        (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>>> +    bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
>>> +        & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
>>> +
>>> +    ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
>>> +    if (ret < 0) {
>>> +        return ret;
>>> +    }
>>> +    return 0;
>>> +}
>>
>> So you don't truncate s->image_file? Does this work?
> 
> s->image_file should be truncated? Image file can have a larger virtual size
> than backing_file, my understanding is we should not truncate image file.

I'm talking about s->image_hd, not bs->backing_hd. You are right that
the backing file should not be changed. But the associated raw image
should be resized, shouldn't it?

>>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>>> +{
>>> +    BDRVAddCowState *s = bs->opaque;
>>> +    int ret;
>>> +
>>> +    qemu_co_mutex_lock(&s->lock);
>>> +    ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
>>> +        ADD_COW_CACHE_ENTRY_SIZE);
>>> +    qemu_co_mutex_unlock(&s->lock);
>>> +    return ret;
>>> +}
>>
>> What about flushing s->image_file?

>>> +typedef struct AddCowHeader {
>>> +    uint64_t        magic;
>>> +    uint32_t        version;
>>> +
>>> +    uint32_t        backing_filename_offset;
>>> +    uint32_t        backing_filename_size;
>>> +
>>> +    uint32_t        image_filename_offset;
>>> +    uint32_t        image_filename_size;
>>> +
>>> +    uint64_t        features;
>>> +    uint64_t        optional_features;
>>> +    uint32_t        header_pages_size;
>>> +} QEMU_PACKED AddCowHeader;
>>
>> Why aren't backing/image_file_format part of the header here? They are
>> in the spec. It would also simplify some offset calculation code.
>>
> 
> Anthony said "It's far better to shrink the size of the header and use
> an offset/len
> pointer to the backing file string.  Limiting backing files to 1023 is
> unacceptable"
> 
> http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg04110.html
> 
> So I use offset  and length instead of using string directly.

I'm talking about the format, not the path.

Kevin

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2012-09-12  7:50 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
2012-09-06 17:27   ` Michael Roth
2012-09-10  1:48     ` Dong Xu Wang
2012-09-10 15:23   ` Kevin Wolf
2012-09-11  2:12     ` Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
2012-09-06 17:27   ` Michael Roth
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
2012-09-06 17:32   ` Michael Roth
2012-09-10  1:49     ` Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
2012-09-06 17:52   ` Michael Roth
2012-09-10  2:14     ` Dong Xu Wang
2012-09-11  8:41   ` Kevin Wolf
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
2012-09-06 20:19   ` Michael Roth
2012-09-10  2:25     ` Dong Xu Wang
2012-09-11  9:44       ` Kevin Wolf
2012-09-11  9:40   ` Kevin Wolf
2012-09-12  7:28     ` Dong Xu Wang
2012-09-12  7:50       ` Kevin Wolf
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
2012-09-11  9:55   ` Kevin Wolf
2012-08-23  5:34 ` [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).