* [Qemu-devel] [PATCH V13 0/6] add-cow file format
@ 2012-10-18 9:51 Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 1/6] docs: document for " Dong Xu Wang
` (5 more replies)
0 siblings, 6 replies; 12+ messages in thread
From: Dong Xu Wang @ 2012-10-18 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
It will introduce a new file format: add-cow.
The add-cow file format makes it possible to perform copy-on-write on top of
a raw disk image. When we know that no backing file clusters remain visible
(e.g. we have streamed the entire image and copied all data from the backing
file), then it is possible to discard the add-cow file and use the raw image
file directly.
This feature adds the copy-on-write feature to raw files (which cannot support
it natively) while allowing us to get full performance again later when we no
longer need copy-on-write.
add-cow can benefit from other available functions, such as path_has_protocol
and qed_read_string, so we will make them public.
snapshot_blkdev are not supported now for add-cow. Will add it in futher patches.
These patches are using QemuOpts parser, former patches could be found here:
http://patchwork.ozlabs.org/patch/191347/
v12->v13:
1) Use QemuOpts, not QEMUOptionParameter
2) cluster_size configuable
3) Refactor block-cache.c
4) Correct qemu-iotests script.
5) Other bug fix.
v11->v12:
1) Removed un-used feature bit.
2) Share cache code with qcow2.c.
3) Remove snapshot_blkdev support, will add it in another patch.
5) COW Bitmap field in add-cow file will be multiple of 65536.
6) fix grammer and typo.
Dong Xu Wang (6):
docs: document for add-cow file format
make path_has_protocol non static
qed_read_string to bdrv_read_string
rename qcow2-cache.c to block-cache.c
add-cow file format core code.
qemu-iotests: add add-cow iotests support.
block.c | 29 ++-
block.h | 3 +
block/Makefile.objs | 4 +-
block/add-cow.c | 693 ++++++++++++++++++++++++++++++++++++++++++
block/add-cow.h | 85 +++++
block/block-cache.c | 321 +++++++++++++++++++
block/block-cache.h | 77 +++++
block/qcow2-cache.c | 323 --------------------
block/qcow2-cluster.c | 54 ++--
block/qcow2-refcount.c | 67 +++--
block/qcow2.c | 21 +-
block/qcow2.h | 24 +--
block/qed.c | 34 +--
block_int.h | 2 +
docs/specs/add-cow.txt | 139 +++++++++
tests/qemu-iotests/017 | 2 +-
tests/qemu-iotests/020 | 2 +-
tests/qemu-iotests/common | 6 +
tests/qemu-iotests/common.rc | 15 +-
trace-events | 13 +-
20 files changed, 1465 insertions(+), 449 deletions(-)
create mode 100644 block/add-cow.c
create mode 100644 block/add-cow.h
create mode 100644 block/block-cache.c
create mode 100644 block/block-cache.h
delete mode 100644 block/qcow2-cache.c
create mode 100644 docs/specs/add-cow.txt
^ permalink raw reply [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH V13 1/6] docs: document for add-cow file format
2012-10-18 9:51 [Qemu-devel] [PATCH V13 0/6] add-cow file format Dong Xu Wang
@ 2012-10-18 9:51 ` Dong Xu Wang
2012-10-18 16:10 ` Eric Blake
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 2/6] make path_has_protocol non static Dong Xu Wang
` (4 subsequent siblings)
5 siblings, 1 reply; 12+ messages in thread
From: Dong Xu Wang @ 2012-10-18 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
Document for add-cow format, the usage and spec of add-cow are introduced.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
docs/specs/add-cow.txt | 139 ++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 139 insertions(+), 0 deletions(-)
create mode 100644 docs/specs/add-cow.txt
diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
new file mode 100644
index 0000000..dc1e107
--- /dev/null
+++ b/docs/specs/add-cow.txt
@@ -0,0 +1,139 @@
+== General ==
+
+The raw file format does not support backing files or copy on write feature.
+The add-cow image format makes it possible to use backing files with raw
+image by keeping a separate .add-cow metadata file. Once all sectors
+have been written into the raw image it is safe to discard the .add-cow
+and backing files, then we can use the raw image directly.
+
+An example usage of add-cow would look like::
+(ubuntu.img is a disk image which has been installed OS.)
+ 1) Create a raw image with the same size of ubuntu.img
+ qemu-img create -f raw test.raw 8G
+ 2) Create an add-cow image which will store dirty bitmap
+ qemu-img create -f add-cow test.add-cow \
+ -o backing_file=ubuntu.img,image_file=test.raw
+ 3) Run qemu with add-cow image
+ qemu -drive if=virtio,file=test.add-cow
+
+test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
+will be calculated from the size of test.raw.
+
+=Specification=
+
+The file format looks like this:
+
+ +---------------+-------------+-----------------+
+ | Header | Reserved | COW bitmap |
+ +---------------+-------------+-----------------+
+
+All numbers in add-cow are stored in Little Endian byte order.
+
+== Header ==
+
+The Header is included in the first bytes:
+(#define HEADER_SIZE (4096 * header_size))
+ Byte 0 - 7: magic
+ add-cow magic string ("ADD_COW\xff").
+
+ 8 - 11: version
+ Version number (only valid value is 1 now).
+
+ 12 - 15: backing file name offset
+ Offset in the add-cow file at which the backing file
+ name is stored (NB: The string is not nul-terminated).
+ If backing file name does NOT exist, this field will be
+ 0. Must be between 80 and [HEADER_SIZE - 2](a file name
+ must be at least 1 byte).
+
+ 16 - 19: backing file name size
+ Length of the backing file name in bytes. It will be 0
+ if the backing file name offset is 0. If backing file
+ name offset is non-zero, then it must be non-zero. Must
+ be less than [HEADER_SIZE - 80] to fit in the reserved
+ part of the header.
+
+ 20 - 23: image file name offset
+ Offset in the add-cow file at which the image file name
+ is stored (NB: The string is not null terminated). It
+ must be between 80 and [HEADER_SIZE - 2].
+
+ 24 - 27: image file name size
+ Length of the image file name in bytes.
+ Must be less than [HEADER_SIZE - 80] to fit in the reserved
+ part of the header.
+
+ 28 - 31: cluster bits
+ Number of bits that are used for addressing an offset
+ within a cluster (1 << cluster_bits is the cluster size).
+ Must not be less than 9 (i.e. 512 byte clusters).
+
+ Note: qemu as of today has an implementation limit of 2 MB
+ as the maximum cluster size and won't be able to open images
+ with larger cluster sizes.
+
+ 32 - 39: features
+ Bitmask of features. An implementation can safely ignore
+ any unknown bits that are set.
+
+ Bit 0: All allocated bit. If this bit is set then
+ backing file and COW bitmap will not be used,
+ and can read from or write to image file directly.
+
+ Bits 1-63: Reserved (set to 0)
+
+ 40 - 47: optional features
+ Not used now. Reserved for future use. It must be set to 0.
+ And must be ignored while reading.
+
+ 48 - 51: header size
+ The header field is variable-sized. This field indicates
+ how many 4096 bytes will be used to store add-cow header.
+ In add-cow v1, it is fixed to 1, so the header size will
+ be 4096 * 1 = 4096 bytes.
+
+ 52 - 67: backing file format
+ Format of backing file. It will be filled with 0 if
+ backing file name offset is 0. If backing file name
+ offset is non-empty, it must be non-empty. It is coded
+ in free-form ASCII, and is not NUL-terminated. Zero
+ padded on the right.
+
+ 68 - 83: image file format
+ Format of image file. It must be non-empty. It is coded
+ in free-form ASCII, and is not NUL-terminated. Zero
+ padded on the right.
+
+ 84 - [HEADER_SIZE - 1]:
+ It is used to make sure COW bitmap field starts at the
+ HEADER_SIZE byte, backing file name and image file name
+ will be stored here. The bytes that is not pointing to
+ backing file and image file names must be set to 0.
+
+== COW bitmap ==
+
+The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
+backing file and image file. The bitmap will track whether the sector in
+backing file is dirty or not.
+
+Each bit in the bitmap tracks one cluster's status. For example, if cluster
+bit is 16, then each bit tracks one cluster, (1 >> 16) = 65536 bytes. The size
+of bitmap is calculated according to virtual size of image file, and it must
+be multiple of 65536, the bits not used will be set to 0. Within each byte,
+the least significant bit covers the first cluster. Bit orders in one
+byte look like:
+ +----+----+----+----+----+----+----+----+
+ | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
+ +----+----+----+----+----+----+----+----+
+
+If the bit is 0, indicates the sector has not been allocated in image file, data
+should be loaded from backing file while reading; if the bit is 1, indicates the
+related sector has been dirty, should be loaded from image file while reading.
+Writing to a sector causes the corresponding bit to be set to 1.
+
+If raw image is not an even multiple of cluster bytes, bits that correspond to
+bytes beyond the raw file size in add-cow must be written as 0 and must be
+ignored when reading.
+
+Image file name and backing file name must NOT be the same, we prevent this
+while creating add-cow files.
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH V13 2/6] make path_has_protocol non static
2012-10-18 9:51 [Qemu-devel] [PATCH V13 0/6] add-cow file format Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 1/6] docs: document for " Dong Xu Wang
@ 2012-10-18 9:51 ` Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
` (3 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Dong Xu Wang @ 2012-10-18 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
We will use path_has_protocol outside block.c, so just make it public.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
---
block.c | 2 +-
block.h | 1 +
2 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/block.c b/block.c
index f639655..03ba485 100644
--- a/block.c
+++ b/block.c
@@ -198,7 +198,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
}
/* check if the path starts with "<protocol>:" */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
{
const char *p;
diff --git a/block.h b/block.h
index 7842d85..364ba04 100644
--- a/block.h
+++ b/block.h
@@ -329,6 +329,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn);
char *get_human_readable_size(char *buf, int buf_size, int64_t size);
int path_is_absolute(const char *path);
+int path_has_protocol(const char *path);
void path_combine(char *dest, int dest_size,
const char *base_path,
const char *filename);
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH V13 3/6] qed_read_string to bdrv_read_string
2012-10-18 9:51 [Qemu-devel] [PATCH V13 0/6] add-cow file format Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 1/6] docs: document for " Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 2/6] make path_has_protocol non static Dong Xu Wang
@ 2012-10-18 9:51 ` Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
` (2 subsequent siblings)
5 siblings, 0 replies; 12+ messages in thread
From: Dong Xu Wang @ 2012-10-18 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
Make qed_read_string function to a common interface, so move it to block.c.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
block.c | 27 +++++++++++++++++++++++++++
block.h | 2 ++
block/qed.c | 34 ++++------------------------------
3 files changed, 33 insertions(+), 30 deletions(-)
diff --git a/block.c b/block.c
index 03ba485..9afb7b5 100644
--- a/block.c
+++ b/block.c
@@ -215,6 +215,33 @@ int path_has_protocol(const char *path)
return *p == ':';
}
+/**
+ * Read a string of known length from the image file
+ *
+ * @bs: Image file
+ * @offset: File offset to start of string, in bytes
+ * @n: String length in bytes
+ * @buf: Destination buffer
+ * @buflen: Destination buffer length in bytes
+ * @ret: 0 on success, -errno on failure
+ *
+ * The string is NUL-terminated.
+ */
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+ char *buf, size_t buflen)
+{
+ int ret;
+ if (n >= buflen) {
+ return -EINVAL;
+ }
+ ret = bdrv_pread(bs, offset, buf, n);
+ if (ret < 0) {
+ return ret;
+ }
+ buf[n] = '\0';
+ return 0;
+}
+
int path_is_absolute(const char *path)
{
#ifdef _WIN32
diff --git a/block.h b/block.h
index 364ba04..166e00c 100644
--- a/block.h
+++ b/block.h
@@ -168,6 +168,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
const void *buf, int count);
int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov);
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+ char *buf, size_t buflen);
int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
diff --git a/block/qed.c b/block/qed.c
index 0a9dbe8..096de21 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
}
/**
- * Read a string of known length from the image file
- *
- * @file: Image file
- * @offset: File offset to start of string, in bytes
- * @n: String length in bytes
- * @buf: Destination buffer
- * @buflen: Destination buffer length in bytes
- * @ret: 0 on success, -errno on failure
- *
- * The string is NUL-terminated.
- */
-static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
- char *buf, size_t buflen)
-{
- int ret;
- if (n >= buflen) {
- return -EINVAL;
- }
- ret = bdrv_pread(file, offset, buf, n);
- if (ret < 0) {
- return ret;
- }
- buf[n] = '\0';
- return 0;
-}
-
-/**
* Allocate new clusters
*
* @s: QED state
@@ -437,9 +410,10 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
return -EINVAL;
}
- ret = qed_read_string(bs->file, s->header.backing_filename_offset,
- s->header.backing_filename_size, bs->backing_file,
- sizeof(bs->backing_file));
+ ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
+ s->header.backing_filename_size,
+ bs->backing_file,
+ sizeof(bs->backing_file));
if (ret < 0) {
return ret;
}
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH V13 4/6] rename qcow2-cache.c to block-cache.c
2012-10-18 9:51 [Qemu-devel] [PATCH V13 0/6] add-cow file format Dong Xu Wang
` (2 preceding siblings ...)
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
@ 2012-10-18 9:51 ` Dong Xu Wang
2012-10-22 8:22 ` Stefan Hajnoczi
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 5/6] add-cow file format core code Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 6/6] qemu-iotests: add add-cow iotests support Dong Xu Wang
5 siblings, 1 reply; 12+ messages in thread
From: Dong Xu Wang @ 2012-10-18 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
We will re-use qcow2-cache as block layer common cache code,
so change its name and made some changes, define a struct named
BlockTableType, pass BlockTableType and table size parameters to
block cache initialization function.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
block/Makefile.objs | 3 +-
block/block-cache.c | 317 +++++++++++++++++++++++++++++++++++++++++++++++
block/block-cache.h | 76 +++++++++++
block/qcow2-cache.c | 323 ------------------------------------------------
block/qcow2-cluster.c | 54 +++++----
block/qcow2-refcount.c | 67 ++++++-----
block/qcow2.c | 21 ++--
block/qcow2.h | 24 +---
trace-events | 13 +-
9 files changed, 483 insertions(+), 415 deletions(-)
create mode 100644 block/block-cache.c
create mode 100644 block/block-cache.h
delete mode 100644 block/qcow2-cache.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 554f429..f128b78 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
+block-obj-y += block-cache.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
diff --git a/block/block-cache.c b/block/block-cache.c
new file mode 100644
index 0000000..bf5c57c
--- /dev/null
+++ b/block/block-cache.c
@@ -0,0 +1,317 @@
+/*
+ * QEMU Block Layer Cache
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This file is based on qcow2-cache.c, see its copyrights below:
+ *
+ * L2/refcount table cache for the QCOW2 format
+ *
+ * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#include "block_int.h"
+#include "qemu-common.h"
+#include "trace.h"
+#include "block-cache.h"
+
+BlockCache *block_cache_create(BlockDriverState *bs, int num_tables,
+ size_t cluster_size, BlockTableType type)
+{
+ BlockCache *c;
+ int i;
+
+ c = g_malloc0(sizeof(*c));
+ c->size = num_tables;
+ c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
+ c->table_type = type;
+ c->cluster_size = cluster_size;
+
+ for (i = 0; i < c->size; i++) {
+ c->entries[i].table = qemu_blockalign(bs, cluster_size);
+ }
+
+ return c;
+}
+
+int block_cache_destroy(BlockDriverState *bs, BlockCache *c)
+{
+ int i;
+
+ for (i = 0; i < c->size; i++) {
+ assert(c->entries[i].ref == 0);
+ qemu_vfree(c->entries[i].table);
+ }
+
+ g_free(c->entries);
+ g_free(c);
+
+ return 0;
+}
+
+static int block_cache_flush_dependency(BlockDriverState *bs, BlockCache *c)
+{
+ int ret;
+
+ ret = block_cache_flush(bs, c->depends);
+ if (ret < 0) {
+ return ret;
+ }
+
+ c->depends = NULL;
+ c->depends_on_flush = false;
+
+ return 0;
+}
+
+static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
+{
+ int ret = 0;
+
+ if (!c->entries[i].dirty || !c->entries[i].offset) {
+ return 0;
+ }
+
+ trace_block_cache_entry_flush(qemu_coroutine_self(), c->table_type, i);
+
+ if (c->depends) {
+ ret = block_cache_flush_dependency(bs, c);
+ } else if (c->depends_on_flush) {
+ ret = bdrv_flush(bs->file);
+ if (ret >= 0) {
+ c->depends_on_flush = false;
+ }
+ }
+
+ if (ret < 0) {
+ return ret;
+ }
+
+ if (c->table_type == BLOCK_TABLE_REF) {
+ BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
+ } else if (c->table_type == BLOCK_TABLE_L2) {
+ BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
+ }
+
+ ret = bdrv_pwrite(bs->file, c->entries[i].offset,
+ c->entries[i].table, c->cluster_size);
+ if (ret < 0) {
+ return ret;
+ }
+
+ c->entries[i].dirty = false;
+
+ return 0;
+}
+
+int block_cache_flush(BlockDriverState *bs, BlockCache *c)
+{
+ int result = 0;
+ int ret;
+ int i;
+
+ trace_block_cache_flush(qemu_coroutine_self(), c->table_type);
+
+ for (i = 0; i < c->size; i++) {
+ ret = block_cache_entry_flush(bs, c, i);
+ if (ret < 0 && result != -ENOSPC) {
+ result = ret;
+ }
+ }
+
+ if (result == 0) {
+ ret = bdrv_flush(bs->file);
+ if (ret < 0) {
+ result = ret;
+ }
+ }
+
+ return result;
+}
+
+int block_cache_set_dependency(BlockDriverState *bs,
+ BlockCache *c,
+ BlockCache *dependency)
+{
+ int ret;
+
+ if (dependency->depends) {
+ ret = block_cache_flush_dependency(bs, dependency);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ if (c->depends && (c->depends != dependency)) {
+ ret = block_cache_flush_dependency(bs, c);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ c->depends = dependency;
+ return 0;
+}
+
+void block_cache_depends_on_flush(BlockCache *c)
+{
+ c->depends_on_flush = true;
+}
+
+static int block_cache_find_entry_to_replace(BlockCache *c)
+{
+ int i;
+ int min_count = INT_MAX;
+ int min_index = -1;
+
+
+ for (i = 0; i < c->size; i++) {
+ if (c->entries[i].ref) {
+ continue;
+ }
+
+ if (c->entries[i].cache_hits < min_count) {
+ min_index = i;
+ min_count = c->entries[i].cache_hits;
+ }
+
+ /* Give newer hits priority */
+ /* TODO Check how to optimize the replacement strategy */
+ c->entries[i].cache_hits /= 2;
+ }
+
+ if (min_index == -1) {
+ /* This can't happen in current synchronous code, but leave the check
+ * here as a reminder for whoever starts using AIO with the cache */
+ abort();
+ }
+ return min_index;
+}
+
+static int block_cache_do_get(BlockDriverState *bs, BlockCache *c,
+ uint64_t offset, void **table,
+ bool read_from_disk)
+{
+ int i;
+ int ret;
+
+ trace_block_cache_get(qemu_coroutine_self(), c->table_type,
+ offset, read_from_disk);
+
+ /* Check if the table is already cached */
+ for (i = 0; i < c->size; i++) {
+ if (c->entries[i].offset == offset) {
+ goto found;
+ }
+ }
+
+ /* If not, write a table back and replace it */
+ i = block_cache_find_entry_to_replace(c);
+ trace_block_cache_get_replace_entry(qemu_coroutine_self(),
+ c->table_type, i);
+ if (i < 0) {
+ return i;
+ }
+
+ ret = block_cache_entry_flush(bs, c, i);
+ if (ret < 0) {
+ return ret;
+ }
+
+ trace_block_cache_get_read(qemu_coroutine_self(),
+ c->table_type, i);
+ c->entries[i].offset = 0;
+ if (read_from_disk) {
+ if (c->table_type == BLOCK_TABLE_L2) {
+ BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
+ }
+
+ ret = bdrv_pread(bs->file, offset, c->entries[i].table,
+ c->cluster_size);
+ if (ret < 0) {
+ return ret;
+ }
+ }
+
+ /* Give the table some hits for the start so that it won't be replaced
+ * immediately. The number 32 is completely arbitrary. */
+ c->entries[i].cache_hits = 32;
+ c->entries[i].offset = offset;
+
+ /* And return the right table */
+found:
+ c->entries[i].cache_hits++;
+ c->entries[i].ref++;
+ *table = c->entries[i].table;
+
+ trace_block_cache_get_done(qemu_coroutine_self(),
+ c->table_type, i);
+
+ return 0;
+}
+
+int block_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
+ void **table)
+{
+ return block_cache_do_get(bs, c, offset, table, true);
+}
+
+int block_cache_get_empty(BlockDriverState *bs, BlockCache *c,
+ uint64_t offset, void **table)
+{
+ return block_cache_do_get(bs, c, offset, table, false);
+}
+
+int block_cache_put(BlockDriverState *bs, BlockCache *c, void **table)
+{
+ int i;
+
+ for (i = 0; i < c->size; i++) {
+ if (c->entries[i].table == *table) {
+ goto found;
+ }
+ }
+ return -ENOENT;
+
+found:
+ c->entries[i].ref--;
+ assert(c->entries[i].ref >= 0);
+ *table = NULL;
+ return 0;
+}
+
+void block_cache_entry_mark_dirty(BlockCache *c, void *table)
+{
+ int i;
+
+ for (i = 0; i < c->size; i++) {
+ if (c->entries[i].table == table) {
+ goto found;
+ }
+ }
+ abort();
+
+found:
+ c->entries[i].dirty = true;
+}
diff --git a/block/block-cache.h b/block/block-cache.h
new file mode 100644
index 0000000..4efa06e
--- /dev/null
+++ b/block/block-cache.h
@@ -0,0 +1,76 @@
+/*
+ * QEMU Block Layer Cache
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This file is based on qcow2-cache.c, see its copyrights below:
+ *
+ * L2/refcount table cache for the QCOW2 format
+ *
+ * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+
+#ifndef BLOCK_CACHE_H
+#define BLOCK_CACHE_H
+
+typedef enum {
+ BLOCK_TABLE_REF,
+ BLOCK_TABLE_L2,
+} BlockTableType;
+
+typedef struct BlockCachedTable {
+ void *table;
+ int64_t offset;
+ bool dirty;
+ int cache_hits;
+ int ref;
+} BlockCachedTable;
+
+struct BlockCache {
+ BlockCachedTable *entries;
+ struct BlockCache *depends;
+ int size;
+ size_t cluster_size;
+ BlockTableType table_type;
+ bool depends_on_flush;
+};
+
+struct BlockCache;
+typedef struct BlockCache BlockCache;
+
+BlockCache *block_cache_create(BlockDriverState *bs, int num_tables,
+ size_t cluster_size, BlockTableType type);
+int block_cache_destroy(BlockDriverState *bs, BlockCache *c);
+int block_cache_flush(BlockDriverState *bs, BlockCache *c);
+int block_cache_set_dependency(BlockDriverState *bs,
+ BlockCache *c,
+ BlockCache *dependency);
+void block_cache_depends_on_flush(BlockCache *c);
+int block_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
+ void **table);
+int block_cache_get_empty(BlockDriverState *bs, BlockCache *c,
+ uint64_t offset, void **table);
+int block_cache_put(BlockDriverState *bs, BlockCache *c, void **table);
+void block_cache_entry_mark_dirty(BlockCache *c, void *table);
+#endif /* BLOCK_CACHE_H */
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
deleted file mode 100644
index 2d4322a..0000000
--- a/block/qcow2-cache.c
+++ /dev/null
@@ -1,323 +0,0 @@
-/*
- * L2/refcount table cache for the QCOW2 format
- *
- * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-#include "block_int.h"
-#include "qemu-common.h"
-#include "qcow2.h"
-#include "trace.h"
-
-typedef struct Qcow2CachedTable {
- void* table;
- int64_t offset;
- bool dirty;
- int cache_hits;
- int ref;
-} Qcow2CachedTable;
-
-struct Qcow2Cache {
- Qcow2CachedTable* entries;
- struct Qcow2Cache* depends;
- int size;
- bool depends_on_flush;
-};
-
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
-{
- BDRVQcowState *s = bs->opaque;
- Qcow2Cache *c;
- int i;
-
- c = g_malloc0(sizeof(*c));
- c->size = num_tables;
- c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
-
- for (i = 0; i < c->size; i++) {
- c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
- }
-
- return c;
-}
-
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
-{
- int i;
-
- for (i = 0; i < c->size; i++) {
- assert(c->entries[i].ref == 0);
- qemu_vfree(c->entries[i].table);
- }
-
- g_free(c->entries);
- g_free(c);
-
- return 0;
-}
-
-static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
-{
- int ret;
-
- ret = qcow2_cache_flush(bs, c->depends);
- if (ret < 0) {
- return ret;
- }
-
- c->depends = NULL;
- c->depends_on_flush = false;
-
- return 0;
-}
-
-static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
-{
- BDRVQcowState *s = bs->opaque;
- int ret = 0;
-
- if (!c->entries[i].dirty || !c->entries[i].offset) {
- return 0;
- }
-
- trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
- c == s->l2_table_cache, i);
-
- if (c->depends) {
- ret = qcow2_cache_flush_dependency(bs, c);
- } else if (c->depends_on_flush) {
- ret = bdrv_flush(bs->file);
- if (ret >= 0) {
- c->depends_on_flush = false;
- }
- }
-
- if (ret < 0) {
- return ret;
- }
-
- if (c == s->refcount_block_cache) {
- BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
- } else if (c == s->l2_table_cache) {
- BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
- }
-
- ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
- s->cluster_size);
- if (ret < 0) {
- return ret;
- }
-
- c->entries[i].dirty = false;
-
- return 0;
-}
-
-int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
-{
- BDRVQcowState *s = bs->opaque;
- int result = 0;
- int ret;
- int i;
-
- trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
-
- for (i = 0; i < c->size; i++) {
- ret = qcow2_cache_entry_flush(bs, c, i);
- if (ret < 0 && result != -ENOSPC) {
- result = ret;
- }
- }
-
- if (result == 0) {
- ret = bdrv_flush(bs->file);
- if (ret < 0) {
- result = ret;
- }
- }
-
- return result;
-}
-
-int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
- Qcow2Cache *dependency)
-{
- int ret;
-
- if (dependency->depends) {
- ret = qcow2_cache_flush_dependency(bs, dependency);
- if (ret < 0) {
- return ret;
- }
- }
-
- if (c->depends && (c->depends != dependency)) {
- ret = qcow2_cache_flush_dependency(bs, c);
- if (ret < 0) {
- return ret;
- }
- }
-
- c->depends = dependency;
- return 0;
-}
-
-void qcow2_cache_depends_on_flush(Qcow2Cache *c)
-{
- c->depends_on_flush = true;
-}
-
-static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
-{
- int i;
- int min_count = INT_MAX;
- int min_index = -1;
-
-
- for (i = 0; i < c->size; i++) {
- if (c->entries[i].ref) {
- continue;
- }
-
- if (c->entries[i].cache_hits < min_count) {
- min_index = i;
- min_count = c->entries[i].cache_hits;
- }
-
- /* Give newer hits priority */
- /* TODO Check how to optimize the replacement strategy */
- c->entries[i].cache_hits /= 2;
- }
-
- if (min_index == -1) {
- /* This can't happen in current synchronous code, but leave the check
- * here as a reminder for whoever starts using AIO with the cache */
- abort();
- }
- return min_index;
-}
-
-static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
- uint64_t offset, void **table, bool read_from_disk)
-{
- BDRVQcowState *s = bs->opaque;
- int i;
- int ret;
-
- trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
- offset, read_from_disk);
-
- /* Check if the table is already cached */
- for (i = 0; i < c->size; i++) {
- if (c->entries[i].offset == offset) {
- goto found;
- }
- }
-
- /* If not, write a table back and replace it */
- i = qcow2_cache_find_entry_to_replace(c);
- trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
- c == s->l2_table_cache, i);
- if (i < 0) {
- return i;
- }
-
- ret = qcow2_cache_entry_flush(bs, c, i);
- if (ret < 0) {
- return ret;
- }
-
- trace_qcow2_cache_get_read(qemu_coroutine_self(),
- c == s->l2_table_cache, i);
- c->entries[i].offset = 0;
- if (read_from_disk) {
- if (c == s->l2_table_cache) {
- BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
- }
-
- ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
- if (ret < 0) {
- return ret;
- }
- }
-
- /* Give the table some hits for the start so that it won't be replaced
- * immediately. The number 32 is completely arbitrary. */
- c->entries[i].cache_hits = 32;
- c->entries[i].offset = offset;
-
- /* And return the right table */
-found:
- c->entries[i].cache_hits++;
- c->entries[i].ref++;
- *table = c->entries[i].table;
-
- trace_qcow2_cache_get_done(qemu_coroutine_self(),
- c == s->l2_table_cache, i);
-
- return 0;
-}
-
-int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
- void **table)
-{
- return qcow2_cache_do_get(bs, c, offset, table, true);
-}
-
-int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
- void **table)
-{
- return qcow2_cache_do_get(bs, c, offset, table, false);
-}
-
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
-{
- int i;
-
- for (i = 0; i < c->size; i++) {
- if (c->entries[i].table == *table) {
- goto found;
- }
- }
- return -ENOENT;
-
-found:
- c->entries[i].ref--;
- *table = NULL;
-
- assert(c->entries[i].ref >= 0);
- return 0;
-}
-
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
-{
- int i;
-
- for (i = 0; i < c->size; i++) {
- if (c->entries[i].table == table) {
- goto found;
- }
- }
- abort();
-
-found:
- c->entries[i].dirty = true;
-}
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e179211..171131f 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -28,6 +28,7 @@
#include "block_int.h"
#include "block/qcow2.h"
#include "trace.h"
+#include "block-cache.h"
int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
{
@@ -69,7 +70,7 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
return new_l1_table_offset;
}
- ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+ ret = block_cache_flush(bs, s->refcount_block_cache);
if (ret < 0) {
goto fail;
}
@@ -119,7 +120,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
BDRVQcowState *s = bs->opaque;
int ret;
- ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
+ ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
+ (void **) l2_table);
return ret;
}
@@ -180,7 +182,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
return l2_offset;
}
- ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+ ret = block_cache_flush(bs, s->refcount_block_cache);
if (ret < 0) {
goto fail;
}
@@ -188,7 +190,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
/* allocate a new entry in the l2 cache */
trace_qcow2_l2_allocate_get_empty(bs, l1_index);
- ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
+ ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
+ (void **) table);
if (ret < 0) {
return ret;
}
@@ -203,16 +206,16 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
/* if there was an old l2 table, read it from the disk */
BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
- ret = qcow2_cache_get(bs, s->l2_table_cache,
- old_l2_offset & L1E_OFFSET_MASK,
- (void**) &old_table);
+ ret = block_cache_get(bs, s->l2_table_cache,
+ old_l2_offset & L1E_OFFSET_MASK,
+ (void **) &old_table);
if (ret < 0) {
goto fail;
}
memcpy(l2_table, old_table, s->cluster_size);
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
+ ret = block_cache_put(bs, s->l2_table_cache, (void **) &old_table);
if (ret < 0) {
goto fail;
}
@@ -222,8 +225,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
trace_qcow2_l2_allocate_write_l2(bs, l1_index);
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
- ret = qcow2_cache_flush(bs, s->l2_table_cache);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ ret = block_cache_flush(bs, s->l2_table_cache);
if (ret < 0) {
goto fail;
}
@@ -242,7 +245,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
fail:
trace_qcow2_l2_allocate_done(bs, l1_index, ret);
- qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
+ block_cache_put(bs, s->l2_table_cache, (void **) table);
s->l1_table[l1_index] = old_l2_offset;
return ret;
}
@@ -475,7 +478,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
abort();
}
- qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
nb_available = (c * s->cluster_sectors);
@@ -584,13 +587,13 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
* allocated. */
cluster_offset = be64_to_cpu(l2_table[l2_index]);
if (cluster_offset & L2E_OFFSET_MASK) {
- qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
return 0;
}
cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
if (cluster_offset < 0) {
- qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
return 0;
}
@@ -605,9 +608,9 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
/* compressed clusters never have the copied flag */
BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
l2_table[l2_index] = cpu_to_be64(cluster_offset);
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
if (ret < 0) {
return 0;
}
@@ -659,18 +662,19 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
* handled.
*/
if (cow) {
- qcow2_cache_depends_on_flush(s->l2_table_cache);
+ block_cache_depends_on_flush(s->l2_table_cache);
}
+
if (qcow2_need_accurate_refcounts(s)) {
- qcow2_cache_set_dependency(bs, s->l2_table_cache,
+ block_cache_set_dependency(bs, s->l2_table_cache,
s->refcount_block_cache);
}
ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
if (ret < 0) {
goto err;
}
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
for (i = 0; i < m->nb_clusters; i++) {
/* if two concurrent writes happen to the same unallocated cluster
@@ -687,7 +691,7 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
if (ret < 0) {
goto err;
}
@@ -913,7 +917,7 @@ again:
* request to complete. If we still had the reference, we could use up the
* whole cache with sleeping requests.
*/
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
if (ret < 0) {
return ret;
}
@@ -1077,14 +1081,14 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
}
/* First remove L2 entries */
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
l2_table[l2_index + i] = cpu_to_be64(0);
/* Then decrease the refcount */
qcow2_free_any_clusters(bs, old_offset, 1);
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
if (ret < 0) {
return ret;
}
@@ -1154,7 +1158,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
old_offset = be64_to_cpu(l2_table[l2_index + i]);
/* Update L2 entries */
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
if (old_offset & QCOW_OFLAG_COMPRESSED) {
l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
qcow2_free_any_clusters(bs, old_offset, 1);
@@ -1163,7 +1167,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
}
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache, (void **) &l2_table);
if (ret < 0) {
return ret;
}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 5e3f915..a57c74b 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -25,6 +25,7 @@
#include "qemu-common.h"
#include "block_int.h"
#include "block/qcow2.h"
+#include "block-cache.h"
static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -71,8 +72,8 @@ static int load_refcount_block(BlockDriverState *bs,
int ret;
BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
- ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
- refcount_block);
+ ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
+ refcount_block);
return ret;
}
@@ -98,8 +99,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
if (!refcount_block_offset)
return 0;
- ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
- (void**) &refcount_block);
+ ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
+ (void **) &refcount_block);
if (ret < 0) {
return ret;
}
@@ -108,8 +109,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
refcount = be16_to_cpu(refcount_block[block_index]);
- ret = qcow2_cache_put(bs, s->refcount_block_cache,
- (void**) &refcount_block);
+ ret = block_cache_put(bs, s->refcount_block_cache,
+ (void **) &refcount_block);
if (ret < 0) {
return ret;
}
@@ -201,7 +202,7 @@ static int alloc_refcount_block(BlockDriverState *bs,
*refcount_block = NULL;
/* We write to the refcount table, so we might depend on L2 tables */
- qcow2_cache_flush(bs, s->l2_table_cache);
+ block_cache_flush(bs, s->l2_table_cache);
/* Allocate the refcount block itself and mark it as used */
int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
@@ -217,8 +218,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
/* Zero the new refcount block before updating it */
- ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
- (void**) refcount_block);
+ ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
+ (void **) refcount_block);
if (ret < 0) {
goto fail_block;
}
@@ -241,8 +242,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
/* Initialize the new refcount block only after updating its refcount,
* update_refcount uses the refcount cache itself */
- ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
- (void**) refcount_block);
+ ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
+ (void **) refcount_block);
if (ret < 0) {
goto fail_block;
}
@@ -252,8 +253,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
/* Now the new refcount block needs to be written to disk */
BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
- qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
- ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+ block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
+ ret = block_cache_flush(bs, s->refcount_block_cache);
if (ret < 0) {
goto fail_block;
}
@@ -273,7 +274,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
return 0;
}
- ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+ ret = block_cache_put(bs, s->refcount_block_cache,
+ (void **) refcount_block);
if (ret < 0) {
goto fail_block;
}
@@ -406,7 +408,8 @@ fail_table:
g_free(new_table);
fail_block:
if (*refcount_block != NULL) {
- qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+ block_cache_put(bs, s->refcount_block_cache,
+ (void **) refcount_block);
}
return ret;
}
@@ -432,8 +435,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
}
if (addend < 0) {
- qcow2_cache_set_dependency(bs, s->refcount_block_cache,
- s->l2_table_cache);
+ block_cache_set_dependency(bs, s->refcount_block_cache,
+ s->l2_table_cache);
}
start = offset & ~(s->cluster_size - 1);
@@ -449,8 +452,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
/* Load the refcount block and allocate it if needed */
if (table_index != old_table_index) {
if (refcount_block) {
- ret = qcow2_cache_put(bs, s->refcount_block_cache,
- (void**) &refcount_block);
+ ret = block_cache_put(bs, s->refcount_block_cache,
+ (void **) &refcount_block);
if (ret < 0) {
goto fail;
}
@@ -463,7 +466,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
}
old_table_index = table_index;
- qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
+ block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
/* we can update the count and save it */
block_index = cluster_index &
@@ -486,8 +489,8 @@ fail:
/* Write last changed block to disk */
if (refcount_block) {
int wret;
- wret = qcow2_cache_put(bs, s->refcount_block_cache,
- (void**) &refcount_block);
+ wret = block_cache_put(bs, s->refcount_block_cache,
+ (void **) &refcount_block);
if (wret < 0) {
return ret < 0 ? ret : wret;
}
@@ -763,8 +766,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
old_l2_offset = l2_offset;
l2_offset &= L1E_OFFSET_MASK;
- ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
- (void**) &l2_table);
+ ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
+ (void **) &l2_table);
if (ret < 0) {
goto fail;
}
@@ -811,16 +814,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
}
if (offset != old_offset) {
if (addend > 0) {
- qcow2_cache_set_dependency(bs, s->l2_table_cache,
+ block_cache_set_dependency(bs, s->l2_table_cache,
s->refcount_block_cache);
}
l2_table[j] = cpu_to_be64(offset);
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ block_cache_entry_mark_dirty(s->l2_table_cache,
+ l2_table);
}
}
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table);
if (ret < 0) {
goto fail;
}
@@ -847,7 +852,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
ret = 0;
fail:
if (l2_table) {
- qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table);
}
/* Update L1 only if it isn't deleted anyway (addend = -1) */
@@ -1130,8 +1136,9 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
0, s->cluster_size);
/* current L1 table */
- ret = check_refcounts_l1(bs, res, refcount_table, nb_clusters,
- s->l1_table_offset, s->l1_size, 1);
+ ret = check_refcounts_l1(bs, res, refcount_table,
+ nb_clusters, s->l1_table_offset,
+ s->l1_size, 1);
if (ret < 0) {
goto fail;
}
diff --git a/block/qcow2.c b/block/qcow2.c
index 025dce2..35950fb 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -430,8 +430,11 @@ static int qcow2_open(BlockDriverState *bs, int flags)
}
/* alloc L2 table/refcount block cache */
- s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
- s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+ s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE,
+ s->cluster_size, BLOCK_TABLE_L2);
+ s->refcount_block_cache = block_cache_create(bs, REFCOUNT_CACHE_SIZE,
+ s->cluster_size,
+ BLOCK_TABLE_REF);
s->cluster_cache = g_malloc(s->cluster_size);
/* one more sector for decompressed data alignment */
@@ -510,7 +513,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
qcow2_refcount_close(bs);
g_free(s->l1_table);
if (s->l2_table_cache) {
- qcow2_cache_destroy(bs, s->l2_table_cache);
+ block_cache_destroy(bs, s->l2_table_cache);
}
g_free(s->cluster_cache);
qemu_vfree(s->cluster_data);
@@ -878,13 +881,13 @@ static void qcow2_close(BlockDriverState *bs)
BDRVQcowState *s = bs->opaque;
g_free(s->l1_table);
- qcow2_cache_flush(bs, s->l2_table_cache);
- qcow2_cache_flush(bs, s->refcount_block_cache);
+ block_cache_flush(bs, s->l2_table_cache);
+ block_cache_flush(bs, s->refcount_block_cache);
qcow2_mark_clean(bs);
- qcow2_cache_destroy(bs, s->l2_table_cache);
- qcow2_cache_destroy(bs, s->refcount_block_cache);
+ block_cache_destroy(bs, s->l2_table_cache);
+ block_cache_destroy(bs, s->refcount_block_cache);
g_free(s->unknown_header_fields);
cleanup_unknown_header_ext(bs);
@@ -1553,14 +1556,14 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
int ret;
qemu_co_mutex_lock(&s->lock);
- ret = qcow2_cache_flush(bs, s->l2_table_cache);
+ ret = block_cache_flush(bs, s->l2_table_cache);
if (ret < 0) {
qemu_co_mutex_unlock(&s->lock);
return ret;
}
if (qcow2_need_accurate_refcounts(s)) {
- ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+ ret = block_cache_flush(bs, s->refcount_block_cache);
if (ret < 0) {
qemu_co_mutex_unlock(&s->lock);
return ret;
diff --git a/block/qcow2.h b/block/qcow2.h
index b4eb654..cb6fd7a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -27,6 +27,7 @@
#include "aes.h"
#include "qemu-coroutine.h"
+#include "block-cache.h"
//#define DEBUG_ALLOC
//#define DEBUG_ALLOC2
@@ -94,8 +95,6 @@ typedef struct QCowSnapshot {
uint64_t vm_clock_nsec;
} QCowSnapshot;
-struct Qcow2Cache;
-typedef struct Qcow2Cache Qcow2Cache;
typedef struct Qcow2UnknownHeaderExtension {
uint32_t magic;
@@ -146,8 +145,8 @@ typedef struct BDRVQcowState {
uint64_t l1_table_offset;
uint64_t *l1_table;
- Qcow2Cache* l2_table_cache;
- Qcow2Cache* refcount_block_cache;
+ BlockCache *l2_table_cache;
+ BlockCache *refcount_block_cache;
uint8_t *cluster_cache;
uint8_t *cluster_data;
@@ -316,21 +315,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
void qcow2_free_snapshots(BlockDriverState *bs);
int qcow2_read_snapshots(BlockDriverState *bs);
-
-/* qcow2-cache.c functions */
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
-
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
-int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
-int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
- Qcow2Cache *dependency);
-void qcow2_cache_depends_on_flush(Qcow2Cache *c);
-
-int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
- void **table);
-int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
- void **table);
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
-
#endif
diff --git a/trace-events b/trace-events
index 42b66f1..df1f12f 100644
--- a/trace-events
+++ b/trace-events
@@ -454,12 +454,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
-qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
-qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
-qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+# block/block-cache.c
+block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
+block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
+block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
# block/qed-l2-cache.c
qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH V13 5/6] add-cow file format core code.
2012-10-18 9:51 [Qemu-devel] [PATCH V13 0/6] add-cow file format Dong Xu Wang
` (3 preceding siblings ...)
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
@ 2012-10-18 9:51 ` Dong Xu Wang
2012-10-22 9:29 ` Stefan Hajnoczi
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 6/6] qemu-iotests: add add-cow iotests support Dong Xu Wang
5 siblings, 1 reply; 12+ messages in thread
From: Dong Xu Wang @ 2012-10-18 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
add-cow file format core code. It use block-cache.c as cache code.
It lacks of snapshot_blkdev support.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
block/Makefile.objs | 1 +
block/add-cow.c | 693 +++++++++++++++++++++++++++++++++++++++++++++++++++
block/add-cow.h | 85 +++++++
block/block-cache.c | 4 +
block/block-cache.h | 1 +
block_int.h | 2 +
6 files changed, 786 insertions(+), 0 deletions(-)
create mode 100644 block/add-cow.c
create mode 100644 block/add-cow.h
diff --git a/block/Makefile.objs b/block/Makefile.objs
index f128b78..ed9222d 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,5 +1,6 @@
block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
+block-obj-y += add-cow.o
block-obj-y += block-cache.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
diff --git a/block/add-cow.c b/block/add-cow.c
new file mode 100644
index 0000000..15c86ab
--- /dev/null
+++ b/block/add-cow.c
@@ -0,0 +1,693 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "block_int.h"
+#include "module.h"
+#include "add-cow.h"
+
+static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
+{
+ cpu->magic = le64_to_cpu(le->magic);
+ cpu->version = le32_to_cpu(le->version);
+
+ cpu->backing_filename_offset = le32_to_cpu(le->backing_filename_offset);
+ cpu->backing_filename_size = le32_to_cpu(le->backing_filename_size);
+
+ cpu->image_filename_offset = le32_to_cpu(le->image_filename_offset);
+ cpu->image_filename_size = le32_to_cpu(le->image_filename_size);
+
+ cpu->cluster_bits = le32_to_cpu(le->cluster_bits);
+ cpu->features = le64_to_cpu(le->features);
+ cpu->optional_features = le64_to_cpu(le->optional_features);
+ cpu->header_pages_size = le32_to_cpu(le->header_pages_size);
+
+ memcpy(cpu->backing_fmt, le->backing_fmt, sizeof(cpu->backing_fmt));
+ memcpy(cpu->image_fmt, le->image_fmt, sizeof(cpu->image_fmt));
+}
+
+static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
+{
+ le->magic = cpu_to_le64(cpu->magic);
+ le->version = cpu_to_le32(cpu->version);
+
+ le->backing_filename_offset = cpu_to_le32(cpu->backing_filename_offset);
+ le->backing_filename_size = cpu_to_le32(cpu->backing_filename_size);
+
+ le->image_filename_offset = cpu_to_le32(cpu->image_filename_offset);
+ le->image_filename_size = cpu_to_le32(cpu->image_filename_size);
+
+ le->cluster_bits = cpu_to_le32(cpu->cluster_bits);
+ le->features = cpu_to_le64(cpu->features);
+ le->optional_features = cpu_to_le64(cpu->optional_features);
+ le->header_pages_size = cpu_to_le32(cpu->header_pages_size);
+ memcpy(le->backing_fmt, cpu->backing_fmt, sizeof(cpu->backing_fmt));
+ memcpy(le->image_fmt, cpu->image_fmt, sizeof(cpu->image_fmt));
+}
+
+static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+ const AddCowHeader *header = (const AddCowHeader *)buf;
+
+ if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
+ le32_to_cpu(header->version) == ADD_COW_VERSION) {
+ return 100;
+ } else {
+ return 0;
+ }
+}
+
+static int add_cow_create(const char *filename, QemuOpts *opts)
+{
+ AddCowHeader header = {
+ .magic = ADD_COW_MAGIC,
+ .version = ADD_COW_VERSION,
+ .features = 0,
+ .optional_features = 0,
+ .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
+ };
+ AddCowHeader le_header;
+ int64_t image_len = 0;
+ const char *backing_filename = NULL;
+ const char *backing_fmt = NULL;
+ const char *image_filename = NULL;
+ const char *image_format = NULL;
+ BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
+ BlockDriver *drv = bdrv_find_format("add-cow");
+ BDRVAddCowState s;
+ size_t cluster_size;
+ int ret;
+
+ image_len = qemu_opt_get_number(opts, BLOCK_OPT_SIZE, 0);
+ backing_filename = qemu_opt_get(opts, BLOCK_OPT_BACKING_FILE);
+ backing_fmt = qemu_opt_get(opts, BLOCK_OPT_BACKING_FMT);
+ image_filename = qemu_opt_get(opts, BLOCK_OPT_IMAGE_FILE);
+ image_format = qemu_opt_get(opts, BLOCK_OPT_IMAGE_FMT);
+ cluster_size = qemu_opt_get_size(opts, BLOCK_OPT_CLUSTER_SIZE,
+ ADD_COW_CLUSTER_SIZE);
+
+ header.cluster_bits = ffs(cluster_size) - 1;
+ if (header.cluster_bits < MIN_CLUSTER_BITS ||
+ header.cluster_bits > MAX_CLUSTER_BITS ||
+ (1 << header.cluster_bits) != cluster_size) {
+ error_report(
+ "Cluster size must be a power of two between %d and %dk",
+ 1 << MIN_CLUSTER_BITS, 1 << (MAX_CLUSTER_BITS - 10));
+ return -EINVAL;
+ }
+
+ if (backing_filename) {
+ header.backing_filename_offset = sizeof(header)
+ + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
+ header.backing_filename_size = strlen(backing_filename);
+
+ if (!backing_fmt) {
+ backing_bs = bdrv_new("image");
+ ret = bdrv_open(backing_bs, backing_filename,
+ BDRV_O_RDWR | BDRV_O_CACHE_WB, NULL);
+ if (ret < 0) {
+ return ret;
+ }
+ backing_fmt = bdrv_get_format_name(backing_bs);
+ bdrv_delete(backing_bs);
+ }
+ } else {
+ header.features |= ADD_COW_F_ALL_ALLOCATED;
+ }
+
+ if (image_filename) {
+ header.image_filename_offset =
+ sizeof(header) + sizeof(s.backing_file_format)
+ + sizeof(s.image_file_format) + header.backing_filename_size;
+ header.image_filename_size = strlen(image_filename);
+ } else {
+ error_report("Error: image_file should be given.");
+ return -EINVAL;
+ }
+
+ if (backing_filename && !strcmp(backing_filename, image_filename)) {
+ error_report("Error: Trying to create an image with the "
+ "same backing file name as the image file name");
+ return -EINVAL;
+ }
+
+ if (!strcmp(filename, image_filename)) {
+ error_report("Error: Trying to create an image with the "
+ "same filename as the image file name");
+ return -EINVAL;
+ }
+
+ if (header.image_filename_offset + header.image_filename_size
+ > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
+ error_report("image_file name or backing_file name too long.");
+ return -ENOSPC;
+ }
+
+ ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
+ if (ret < 0) {
+ return ret;
+ }
+ bdrv_delete(image_bs);
+
+ ret = bdrv_create_file(filename, NULL);
+ if (ret < 0) {
+ return ret;
+ }
+
+ ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
+ if (ret < 0) {
+ return ret;
+ }
+ snprintf(header.backing_fmt, sizeof(header.backing_fmt),
+ "%s", backing_fmt ? backing_fmt : "");
+ snprintf(header.image_fmt, sizeof(header.image_fmt),
+ "%s", image_format ? image_format : "raw");
+ add_cow_header_cpu_to_le(&header, &le_header);
+ ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+
+ if (backing_filename) {
+ ret = bdrv_pwrite(bs, header.backing_filename_offset,
+ backing_filename, header.backing_filename_size);
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+ }
+
+ ret = bdrv_pwrite(bs, header.image_filename_offset,
+ image_filename, header.image_filename_size);
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+
+ ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+
+ ret = bdrv_truncate(bs, image_len);
+ bdrv_delete(bs);
+ return ret;
+}
+
+static int add_cow_open(BlockDriverState *bs, int flags)
+{
+ char image_filename[ADD_COW_FILE_LEN];
+ char tmp_name[ADD_COW_FILE_LEN];
+ int ret;
+ int sector_per_byte;
+ BDRVAddCowState *s = bs->opaque;
+ AddCowHeader le_header;
+
+ ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
+ if (ret < 0) {
+ goto fail;
+ }
+
+ add_cow_header_le_to_cpu(&le_header, &s->header);
+
+ if (s->header.magic != ADD_COW_MAGIC) {
+ ret = -EINVAL;
+ goto fail;
+ }
+
+ if (s->header.version != ADD_COW_VERSION) {
+ char version[64];
+ snprintf(version, sizeof(version), "ADD-COW version %d",
+ s->header.version);
+ qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+ bs->device_name, "add-cow", version);
+ ret = -ENOTSUP;
+ goto fail;
+ }
+
+ if (s->header.features & ~ADD_COW_FEATURE_MASK) {
+ char buf[64];
+ snprintf(buf, sizeof(buf), "Feature Flags: %" PRIx64,
+ s->header.features & ~ADD_COW_FEATURE_MASK);
+ qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+ bs->device_name, "add-cow", buf);
+ return -ENOTSUP;
+ }
+
+ if ((s->header.features & ADD_COW_F_ALL_ALLOCATED) == 0) {
+ ret = bdrv_read_string(bs->file, sizeof(s->header),
+ sizeof(bs->backing_format) - 1,
+ bs->backing_format,
+ sizeof(bs->backing_format));
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
+ if (s->header.cluster_bits < MIN_CLUSTER_BITS ||
+ s->header.cluster_bits > MAX_CLUSTER_BITS) {
+ ret = -EINVAL;
+ goto fail;
+ }
+
+ if ((s->header.features & ADD_COW_F_ALL_ALLOCATED) == 0) {
+ ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
+ s->header.backing_filename_size,
+ bs->backing_file,
+ sizeof(bs->backing_file));
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
+ ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
+ s->header.image_filename_size, tmp_name,
+ sizeof(tmp_name));
+ if (ret < 0) {
+ goto fail;
+ }
+
+ s->image_hd = bdrv_new("");
+ if (path_has_protocol(image_filename)) {
+ pstrcpy(image_filename, sizeof(image_filename), tmp_name);
+ } else {
+ path_combine(image_filename, sizeof(image_filename),
+ bs->filename, tmp_name);
+ }
+
+ ret = bdrv_open(s->image_hd, image_filename, flags, NULL);
+ if (ret < 0) {
+ bdrv_delete(s->image_hd);
+ goto fail;
+ }
+
+ bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
+ s->cluster_size = 1 << s->header.cluster_bits;
+ sector_per_byte = SECTORS_PER_CLUSTER * 8;
+ s->bitmap_size =
+ (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
+ s->bitmap_cache =
+ block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE,
+ BLOCK_TABLE_BITMAP);
+
+ qemu_co_mutex_init(&s->lock);
+ return 0;
+fail:
+ if (s->bitmap_cache) {
+ block_cache_destroy(bs, s->bitmap_cache);
+ }
+ return ret;
+}
+
+static void add_cow_close(BlockDriverState *bs)
+{
+ BDRVAddCowState *s = bs->opaque;
+ block_cache_destroy(bs, s->bitmap_cache);
+ bdrv_delete(s->image_hd);
+}
+
+static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
+{
+ BDRVAddCowState *s = bs->opaque;
+ BlockCache *c = s->bitmap_cache;
+ int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
+ uint8_t *table = NULL;
+ bool val = false;
+ int ret;
+
+ uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
+ + (offset_in_bitmap(sector_num) & (~(c->cluster_size - 1)));
+ ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
+ if (ret < 0) {
+ return ret;
+ }
+
+ val = table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
+ & (1 << (cluster_num % 8));
+ ret = block_cache_put(bs, s->bitmap_cache, (void **)&table);
+ if (ret < 0) {
+ return ret;
+ }
+ return val;
+}
+
+static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors, int *num_same)
+{
+ BDRVAddCowState *s = bs->opaque;
+ int changed;
+
+ if (nb_sectors == 0) {
+ *num_same = 0;
+ return 0;
+ }
+
+ if (s->header.features & ADD_COW_F_ALL_ALLOCATED) {
+ *num_same = nb_sectors;
+ return 1;
+ }
+ changed = is_allocated(bs, sector_num);
+
+ for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
+ if (is_allocated(bs, sector_num + *num_same) != changed) {
+ break;
+ }
+ }
+ return changed;
+}
+
+static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
+ int64_t sector_num, int nb_sectors)
+{
+ int n1;
+ if ((sector_num + nb_sectors) <= bs->total_sectors) {
+ return nb_sectors;
+ }
+ if (sector_num >= bs->total_sectors) {
+ n1 = 0;
+ } else {
+ n1 = bs->total_sectors - sector_num;
+ }
+
+ qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
+ 0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
+
+ return n1;
+}
+
+static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
+ int64_t sector_num,
+ int remaining_sectors,
+ QEMUIOVector *qiov)
+{
+ BDRVAddCowState *s = bs->opaque;
+ int cur_nr_sectors;
+ uint64_t bytes_done = 0;
+ QEMUIOVector hd_qiov;
+ int n1, ret = 0;
+
+ qemu_iovec_init(&hd_qiov, qiov->niov);
+ qemu_co_mutex_lock(&s->lock);
+ while (remaining_sectors != 0) {
+ cur_nr_sectors = remaining_sectors;
+ if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors,
+ &cur_nr_sectors)) {
+ qemu_iovec_reset(&hd_qiov);
+ qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
+ cur_nr_sectors * BDRV_SECTOR_SIZE);
+ qemu_co_mutex_unlock(&s->lock);
+ ret = bdrv_co_readv(s->image_hd, sector_num,
+ cur_nr_sectors, &hd_qiov);
+ qemu_co_mutex_lock(&s->lock);
+ if (ret < 0) {
+ goto fail;
+ }
+ } else {
+ if (bs->backing_hd) {
+ qemu_iovec_reset(&hd_qiov);
+ qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
+ cur_nr_sectors * BDRV_SECTOR_SIZE);
+ n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
+ sector_num, cur_nr_sectors);
+ if (n1 > 0) {
+ qemu_co_mutex_unlock(&s->lock);
+ ret = bdrv_co_readv(bs->backing_hd, sector_num,
+ cur_nr_sectors, &hd_qiov);
+ qemu_co_mutex_lock(&s->lock);
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+ } else {
+ qemu_iovec_memset(&hd_qiov, 0, 0,
+ BDRV_SECTOR_SIZE * cur_nr_sectors);
+ }
+ }
+ remaining_sectors -= cur_nr_sectors;
+ sector_num += cur_nr_sectors;
+ bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
+ }
+fail:
+ qemu_co_mutex_unlock(&s->lock);
+ qemu_iovec_destroy(&hd_qiov);
+ return ret;
+}
+
+static int coroutine_fn copy_sectors(BlockDriverState *bs,
+ int n_start, int n_end)
+{
+ BDRVAddCowState *s = bs->opaque;
+ QEMUIOVector qiov;
+ struct iovec iov;
+ int n, ret;
+
+ n = n_end - n_start;
+ if (n <= 0) {
+ return 0;
+ }
+
+ iov.iov_len = n * BDRV_SECTOR_SIZE;
+ iov.iov_base = qemu_blockalign(bs, iov.iov_len);
+
+ qemu_iovec_init_external(&qiov, &iov, 1);
+
+ ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
+ if (ret < 0) {
+ goto out;
+ }
+ ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
+ if (ret < 0) {
+ goto out;
+ }
+
+ ret = 0;
+out:
+ qemu_vfree(iov.iov_base);
+ return ret;
+}
+
+static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
+ int64_t sector_num,
+ int remaining_sectors,
+ QEMUIOVector *qiov)
+{
+ BDRVAddCowState *s = bs->opaque;
+ BlockCache *c = s->bitmap_cache;
+ int ret = 0, i;
+ QEMUIOVector hd_qiov;
+ uint8_t *table;
+ uint64_t offset;
+ int mask = SECTORS_PER_CLUSTER - 1;
+ int table_mask = c->cluster_size - 1;
+
+ qemu_co_mutex_lock(&s->lock);
+ qemu_iovec_init(&hd_qiov, qiov->niov);
+ ret = bdrv_co_writev(s->image_hd, sector_num,
+ remaining_sectors, qiov);
+
+ if (ret < 0) {
+ goto fail;
+ }
+ if ((s->header.features & ADD_COW_F_ALL_ALLOCATED) == 0) {
+ /* Copy content of unmodified sectors */
+ if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
+ ret = copy_sectors(bs, sector_num & ~mask, sector_num);
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
+ if (!is_cluster_tail(sector_num + remaining_sectors - 1)
+ && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
+ ret = copy_sectors(bs, sector_num + remaining_sectors,
+ ((sector_num + remaining_sectors) | mask) + 1);
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
+ for (i = sector_num / SECTORS_PER_CLUSTER;
+ i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
+ i++) {
+ offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
+ + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~table_mask));
+ ret = block_cache_get(bs, s->bitmap_cache, offset, (void **)&table);
+ if (ret < 0) {
+ goto fail;
+ }
+ if ((table[i / 8] & (1 << (i % 8))) == 0) {
+ table[i / 8] |= (1 << (i % 8));
+ block_cache_entry_mark_dirty(s->bitmap_cache, table);
+ }
+
+ ret = block_cache_put(bs, s->bitmap_cache, (void **) &table);
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+ }
+ ret = 0;
+fail:
+ qemu_co_mutex_unlock(&s->lock);
+ qemu_iovec_destroy(&hd_qiov);
+ return ret;
+}
+
+static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
+{
+ BDRVAddCowState *s = bs->opaque;
+ int sector_per_byte = SECTORS_PER_CLUSTER * 8;
+ int ret;
+ uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
+ int64_t bitmap_size =
+ (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
+ bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
+ & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
+
+ ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
+ if (ret < 0) {
+ return ret;
+ }
+
+ ret = bdrv_truncate(s->image_hd, size);
+ if (ret < 0) {
+ return ret;
+ }
+ return 0;
+}
+
+static int add_cow_reopen_prepare(BDRVReopenState *state,
+ BlockReopenQueue *queue, Error **errp)
+{
+ BDRVAddCowState *s;
+ int ret = -1;
+
+ assert(state != NULL);
+ assert(state->bs != NULL);
+
+ if (queue == NULL) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "No reopen queue for add-cow");
+ goto exit;
+ }
+
+ s = state->bs->opaque;
+
+ assert(s != NULL);
+
+
+ bdrv_reopen_queue(queue, s->image_hd, state->flags);
+ ret = 0;
+
+exit:
+ return ret;
+}
+
+
+static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
+{
+ BDRVAddCowState *s = bs->opaque;
+ int ret;
+
+ qemu_co_mutex_lock(&s->lock);
+ ret = block_cache_flush(bs, s->bitmap_cache);
+ if (ret < 0) {
+ return ret;
+ }
+ ret = bdrv_flush(s->image_hd);
+ qemu_co_mutex_unlock(&s->lock);
+ return ret;
+}
+
+static int add_cow_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+ BDRVAddCowState *s = bs->opaque;
+ bdi->cluster_size = s->cluster_size;
+ return 0;
+}
+
+static QemuOptsList add_cow_create_opts = {
+ .name = "add-cow-create-opts",
+ .head = QTAILQ_HEAD_INITIALIZER(add_cow_create_opts.head),
+ .desc = {
+ {
+ .name = BLOCK_OPT_SIZE,
+ .type = QEMU_OPT_NUMBER,
+ .help = "Virtual disk size"
+ },
+ {
+ .name = BLOCK_OPT_BACKING_FILE,
+ .type = QEMU_OPT_STRING,
+ .help = "File name of a base image"
+ },
+ {
+ .name = BLOCK_OPT_BACKING_FMT,
+ .type = QEMU_OPT_STRING,
+ .help = "Image format of the base image"
+ },
+ {
+ .name = BLOCK_OPT_IMAGE_FILE,
+ .type = QEMU_OPT_STRING,
+ .help = "File name of a image file"
+ },
+ {
+ .name = BLOCK_OPT_IMAGE_FMT,
+ .type = QEMU_OPT_STRING,
+ .help = "Image format of the image file"
+ },
+ {
+ .name = BLOCK_OPT_CLUSTER_SIZE,
+ .type = QEMU_OPT_SIZE,
+ .help = "add-cow cluster size",
+ .def_value = ADD_COW_CLUSTER_SIZE
+ },
+ { /* end of list */ }
+ }
+};
+
+static QemuOptsList *add_cow_create_options(void)
+{
+ return &add_cow_create_opts;
+}
+
+static BlockDriver bdrv_add_cow = {
+ .format_name = "add-cow",
+ .instance_size = sizeof(BDRVAddCowState),
+ .bdrv_probe = add_cow_probe,
+ .bdrv_open = add_cow_open,
+ .bdrv_close = add_cow_close,
+ .bdrv_create = add_cow_create,
+ .bdrv_co_readv = add_cow_co_readv,
+ .bdrv_co_writev = add_cow_co_writev,
+ .bdrv_truncate = bdrv_add_cow_truncate,
+ .bdrv_co_is_allocated = add_cow_is_allocated,
+ .bdrv_reopen_prepare = add_cow_reopen_prepare,
+ .bdrv_get_info = add_cow_get_info,
+
+ .bdrv_create_options = add_cow_create_options,
+ .bdrv_co_flush_to_os = add_cow_co_flush,
+};
+
+static void bdrv_add_cow_init(void)
+{
+ bdrv_register(&bdrv_add_cow);
+}
+
+block_init(bdrv_add_cow_init);
diff --git a/block/add-cow.h b/block/add-cow.h
new file mode 100644
index 0000000..ba9a61e
--- /dev/null
+++ b/block/add-cow.h
@@ -0,0 +1,85 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef BLOCK_ADD_COW_H
+#define BLOCK_ADD_COW_H
+#include "block-cache.h"
+
+enum {
+ ADD_COW_F_ALL_ALLOCATED = 0X01,
+ ADD_COW_FEATURE_MASK = ADD_COW_F_ALL_ALLOCATED,
+
+ ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
+ ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
+ ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
+ ((uint64_t)'W' << 8) | 0xFF),
+ ADD_COW_VERSION = 1,
+ ADD_COW_FILE_LEN = 1024,
+ ADD_COW_CACHE_SIZE = 16,
+ ADD_COW_CACHE_ENTRY_SIZE = 65536,
+ ADD_COW_CLUSTER_SIZE = 65536,
+ SECTORS_PER_CLUSTER = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
+ ADD_COW_PAGE_SIZE = 4096,
+ ADD_COW_DEFAULT_PAGE_SIZE = 1,
+ MIN_CLUSTER_BITS = 9,
+ MAX_CLUSTER_BITS = 21,
+};
+
+typedef struct AddCowHeader {
+ uint64_t magic;
+ uint32_t version;
+
+ uint32_t backing_filename_offset;
+ uint32_t backing_filename_size;
+
+ uint32_t image_filename_offset;
+ uint32_t image_filename_size;
+
+ uint32_t cluster_bits;
+
+ uint64_t features;
+ uint64_t optional_features;
+ uint32_t header_pages_size;
+
+ char backing_fmt[16];
+ char image_fmt[16];
+} QEMU_PACKED AddCowHeader;
+
+typedef struct BDRVAddCowState {
+ BlockDriverState *image_hd;
+ CoMutex lock;
+ int cluster_size;
+ BlockCache *bitmap_cache;
+ uint64_t bitmap_size;
+ AddCowHeader header;
+ char backing_file_format[16];
+ char image_file_format[16];
+} BDRVAddCowState;
+
+/* Convert sector_num to offset in bitmap */
+static inline int64_t offset_in_bitmap(int64_t sector_num)
+{
+ int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
+ return cluster_num / 8;
+}
+
+static inline bool is_cluster_head(int64_t sector_num)
+{
+ return sector_num % SECTORS_PER_CLUSTER == 0;
+}
+
+static inline bool is_cluster_tail(int64_t sector_num)
+{
+ return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
+}
+#endif
diff --git a/block/block-cache.c b/block/block-cache.c
index bf5c57c..1a30462 100644
--- a/block/block-cache.c
+++ b/block/block-cache.c
@@ -112,6 +112,8 @@ static int block_cache_entry_flush(BlockDriverState *bs, BlockCache *c, int i)
BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
} else if (c->table_type == BLOCK_TABLE_L2) {
BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
+ } else if (c->table_type == BLOCK_TABLE_BITMAP) {
+ BLKDBG_EVENT(bs->file, BLKDBG_COW_WRITE);
}
ret = bdrv_pwrite(bs->file, c->entries[i].offset,
@@ -245,6 +247,8 @@ static int block_cache_do_get(BlockDriverState *bs, BlockCache *c,
if (read_from_disk) {
if (c->table_type == BLOCK_TABLE_L2) {
BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
+ } else if (c->table_type == BLOCK_TABLE_BITMAP) {
+ BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
}
ret = bdrv_pread(bs->file, offset, c->entries[i].table,
diff --git a/block/block-cache.h b/block/block-cache.h
index 4efa06e..a3c4a1c 100644
--- a/block/block-cache.h
+++ b/block/block-cache.h
@@ -37,6 +37,7 @@
typedef enum {
BLOCK_TABLE_REF,
BLOCK_TABLE_L2,
+ BLOCK_TABLE_BITMAP,
} BlockTableType;
typedef struct BlockCachedTable {
diff --git a/block_int.h b/block_int.h
index a104e70..8a79045 100644
--- a/block_int.h
+++ b/block_int.h
@@ -55,6 +55,8 @@
#define BLOCK_OPT_SUBFMT "subformat"
#define BLOCK_OPT_COMPAT_LEVEL "compat"
#define BLOCK_OPT_LAZY_REFCOUNTS "lazy_refcounts"
+#define BLOCK_OPT_IMAGE_FILE "image_file"
+#define BLOCK_OPT_IMAGE_FMT "image_format"
typedef struct BdrvTrackedRequest BdrvTrackedRequest;
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [Qemu-devel] [PATCH V13 6/6] qemu-iotests: add add-cow iotests support.
2012-10-18 9:51 [Qemu-devel] [PATCH V13 0/6] add-cow file format Dong Xu Wang
` (4 preceding siblings ...)
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 5/6] add-cow file format core code Dong Xu Wang
@ 2012-10-18 9:51 ` Dong Xu Wang
5 siblings, 0 replies; 12+ messages in thread
From: Dong Xu Wang @ 2012-10-18 9:51 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
This patch will use qemu-iotests to test add-cow file format.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
tests/qemu-iotests/017 | 2 +-
tests/qemu-iotests/020 | 2 +-
tests/qemu-iotests/common | 6 ++++++
tests/qemu-iotests/common.rc | 15 ++++++++++++++-
4 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
index 66951eb..d31432f 100755
--- a/tests/qemu-iotests/017
+++ b/tests/qemu-iotests/017
@@ -40,7 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
. ./common.pattern
# Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
_supported_proto generic
_supported_os Linux
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 2fb0ff8..3dbb495 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
. ./common.pattern
# Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
_supported_proto generic
_supported_os Linux
diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 1f6fdf5..4c06895 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -128,6 +128,7 @@ common options
check options
-raw test raw (default)
-cow test cow
+ -add-cow test add-cow
-qcow test qcow
-qcow2 test qcow2
-qed test qed
@@ -163,6 +164,11 @@ testlist options
xpand=false
;;
+ -add-cow)
+ IMGFMT=add-cow
+ xpand=false
+ ;;
+
-qcow)
IMGFMT=qcow
xpand=false
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index d534e94..f48b02a 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -97,6 +97,16 @@ _make_test_img()
fi
if [ \( "$IMGFMT" = "qcow2" -o "$IMGFMT" = "qed" \) -a -n "$CLUSTER_SIZE" ]; then
optstr=$(_optstr_add "$optstr" "cluster_size=$CLUSTER_SIZE")
+ elif [ "$IMGFMT" = "add-cow" ]; then
+ local IMG="$TEST_IMG"".raw"
+ if [ "$1" = "-b" ]; then
+ IMG="$IMG"".b"
+ $QEMU_IMG create -f raw $IMG $image_size>/dev/null
+ extra_img_options="-o image_file=$IMG $extra_img_options"
+ else
+ $QEMU_IMG create -f raw $IMG $image_size>/dev/null
+ extra_img_options="-o image_file=$IMG"
+ fi
fi
if [ -n "$optstr" ]; then
@@ -114,7 +124,8 @@ _make_test_img()
-e "s# compat='[^']*'##g" \
-e "s# compat6=\\(on\\|off\\)##g" \
-e "s# static=\\(on\\|off\\)##g" \
- -e "s# lazy_refcounts=\\(on\\|off\\)##g"
+ -e "s# lazy_refcounts=\\(on\\|off\\)##g" \
+ -e "s# image_file='[^']*'##g"
}
_cleanup_test_img()
@@ -125,6 +136,8 @@ _cleanup_test_img()
rm -f $TEST_DIR/t.$IMGFMT
rm -f $TEST_DIR/t.$IMGFMT.orig
rm -f $TEST_DIR/t.$IMGFMT.base
+ rm -f $TEST_DIR/t.$IMGFMT.raw
+ rm -f $TEST_DIR/t.$IMGFMT.raw.b
;;
rbd)
--
1.7.1
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH V13 1/6] docs: document for add-cow file format
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 1/6] docs: document for " Dong Xu Wang
@ 2012-10-18 16:10 ` Eric Blake
2012-10-19 2:14 ` Dong Xu Wang
0 siblings, 1 reply; 12+ messages in thread
From: Eric Blake @ 2012-10-18 16:10 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: kwolf, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 11234 bytes --]
On 10/18/2012 03:51 AM, Dong Xu Wang wrote:
> Document for add-cow format, the usage and spec of add-cow are introduced.
>
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> docs/specs/add-cow.txt | 139 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 139 insertions(+), 0 deletions(-)
> create mode 100644 docs/specs/add-cow.txt
>
> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
> new file mode 100644
> index 0000000..dc1e107
> --- /dev/null
> +++ b/docs/specs/add-cow.txt
> @@ -0,0 +1,139 @@
> +== General ==
> +
> +The raw file format does not support backing files or copy on write feature.
> +The add-cow image format makes it possible to use backing files with raw
s/with raw/with a raw/
> +image by keeping a separate .add-cow metadata file. Once all sectors
> +have been written into the raw image it is safe to discard the .add-cow
> +and backing files, then we can use the raw image directly.
> +
> +An example usage of add-cow would look like::
> +(ubuntu.img is a disk image which has been installed OS.)
s/has been installed/has an installed/
> + 1) Create a raw image with the same size of ubuntu.img
> + qemu-img create -f raw test.raw 8G
> + 2) Create an add-cow image which will store dirty bitmap
> + qemu-img create -f add-cow test.add-cow \
> + -o backing_file=ubuntu.img,image_file=test.raw
> + 3) Run qemu with add-cow image
> + qemu -drive if=virtio,file=test.add-cow
> +
> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
> +will be calculated from the size of test.raw.
> +
> +=Specification=
> +
> +The file format looks like this:
> +
> + +---------------+-------------+-----------------+
> + | Header | Reserved | COW bitmap |
> + +---------------+-------------+-----------------+
Since you call out what all 4096 bytes must be in the Header section
(all bytes not occupied by a backing file name or format must be NUL),
do your really need Reserved in this section, or can you just claim that
the 4096-byte header is directly followed by the COW bitmap?
> +
> +All numbers in add-cow are stored in Little Endian byte order.
> +
> +== Header ==
> +
> +The Header is included in the first bytes:
> +(#define HEADER_SIZE (4096 * header_size))
> + Byte 0 - 7: magic
> + add-cow magic string ("ADD_COW\xff").
> +
> + 8 - 11: version
> + Version number (only valid value is 1 now).
> +
> + 12 - 15: backing file name offset
> + Offset in the add-cow file at which the backing file
> + name is stored (NB: The string is not nul-terminated).
> + If backing file name does NOT exist, this field will be
> + 0. Must be between 80 and [HEADER_SIZE - 2](a file name
> + must be at least 1 byte).
> +
> + 16 - 19: backing file name size
> + Length of the backing file name in bytes. It will be 0
> + if the backing file name offset is 0. If backing file
> + name offset is non-zero, then it must be non-zero. Must
> + be less than [HEADER_SIZE - 80] to fit in the reserved
> + part of the header.
More specifically, it must be small enough so that offset+size <=
HEADER_SIZE.
> +
> + 20 - 23: image file name offset
> + Offset in the add-cow file at which the image file name
> + is stored (NB: The string is not null terminated). It
> + must be between 80 and [HEADER_SIZE - 2].
> +
> + 24 - 27: image file name size
> + Length of the image file name in bytes.
> + Must be less than [HEADER_SIZE - 80] to fit in the reserved
> + part of the header.
More specifically, it must be small enough so that offset+size <=
HEADER_SIZE.
> +
> + 28 - 31: cluster bits
> + Number of bits that are used for addressing an offset
> + within a cluster (1 << cluster_bits is the cluster size).
> + Must not be less than 9 (i.e. 512 byte clusters).
> +
> + Note: qemu as of today has an implementation limit of 2 MB
> + as the maximum cluster size and won't be able to open images
> + with larger cluster sizes.
> +
> + 32 - 39: features
> + Bitmask of features. An implementation can safely ignore
> + any unknown bits that are set.
Really? That sounds more like optional features, if an implementation
can ignore a set bit. You really want to require that implementations
reject operations on a file with a feature bit set that they don't
recognize.
> +
> + Bit 0: All allocated bit. If this bit is set then
> + backing file and COW bitmap will not be used,
> + and can read from or write to image file directly.
And this particular bit sounds like an optional feature - setting the
bit is an optimization in speed, but leaving the bit clear or ignoring
the bit when it is set does not change correctness.
> +
> + Bits 1-63: Reserved (set to 0)
> +
> + 40 - 47: optional features
> + Not used now. Reserved for future use. It must be set to 0.
> + And must be ignored while reading.
s/. And must be ignored/, and ignored/
> +
> + 48 - 51: header size
> + The header field is variable-sized. This field indicates
> + how many 4096 bytes will be used to store add-cow header.
> + In add-cow v1, it is fixed to 1, so the header size will
> + be 4096 * 1 = 4096 bytes.
So is the value '1' or '4096' in this field? The wording isn't quite
clear. But reading elsewhere, it looks like this should always be '1'
in version one add-cow.
> +
> + 52 - 67: backing file format
> + Format of backing file. It will be filled with 0 if
> + backing file name offset is 0. If backing file name
> + offset is non-empty, it must be non-empty.
Are you going to enforce this? Normally, if backing file format is
omitted, then qemu knows how to probe backing file format (with the
caveat that it is a security risk if the probe returns non-raw but the
backing file really was raw).
> It is coded
> + in free-form ASCII, and is not NUL-terminated. Zero
> + padded on the right.
> +
> + 68 - 83: image file format
> + Format of image file. It must be non-empty. It is coded
> + in free-form ASCII, and is not NUL-terminated. Zero
> + padded on the right.
Why do we need this field? Isn't the image file ALWAYS raw?
> +
> + 84 - [HEADER_SIZE - 1]:
Elsewhere in your spec, you use 80 rather than 84 for the starting point
of valid offsets. Which is it?
> + It is used to make sure COW bitmap field starts at the
> + HEADER_SIZE byte, backing file name and image file name
> + will be stored here. The bytes that is not pointing to
> + backing file and image file names must be set to 0.
Not just file names, but also backing file format.
> +
> +== COW bitmap ==
> +
> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
Shouldn't that be HEADER_SIZE * number of header pages, since you
dedicated a field in the header for that purpose?
> +backing file and image file. The bitmap will track whether the sector in
> +backing file is dirty or not.
> +
> +Each bit in the bitmap tracks one cluster's status. For example, if cluster
> +bit is 16, then each bit tracks one cluster, (1 >> 16) = 65536 bytes. The size
s/>>/<</
> +of bitmap is calculated according to virtual size of image file, and it must
> +be multiple of 65536, the bits not used will be set to 0. Within each byte,
What must be a multiple of 65536, the image file, or the size of the
bitmap in the add-cow file? I think what you want is:
The image size is rounded up to cluster size (where any bytes in the
last cluster that do not fit in the image are ignored), then if the
number of clusters is not a multiple of 8, then remaining bits in the
bitmap will be set to 0.
Or do you really want to require that the bitmap is a multiple of 64k
bytes (at 8 bits per byte, that means the bitmap covers a multiple of
512k clusters, and at 512 bytes as the minimum cluster size, that the
add-cow file format manages a minimum of 256M)? That is, are you
requiring that the bitmap end on an aligned boundary, to make the bitmap
easier to use without having to special case a short-read on the last
page of the bitmap?
> +the least significant bit covers the first cluster. Bit orders in one
> +byte look like:
> + +----+----+----+----+----+----+----+----+
> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
> + +----+----+----+----+----+----+----+----+
> +
> +If the bit is 0, indicates the sector has not been allocated in image file, data
> +should be loaded from backing file while reading; if the bit is 1, indicates the
s/indicates/it indicates/ (twice)
If there is no backing file, or if the image file is larger than the
backing file and the offset is beyond the end of the backing file, then
the data should be read as all zero bytes instead.
> +related sector has been dirty, should be loaded from image file while reading.
> +Writing to a sector causes the corresponding bit to be set to 1.
> +
> +If raw image is not an even multiple of cluster bytes, bits that correspond to
> +bytes beyond the raw file size in add-cow must be written as 0 and must be
> +ignored when reading.
> +
> +Image file name and backing file name must NOT be the same, we prevent this
> +while creating add-cow files.
You prevent it when creating add-cow files via qemu-img, but that
doesn't stop malicious users from creating a file with those properties
and then trying to get you to parse it as add-cow. I think this needs
to instead be a requirement on the consumer of a potentially bad file,
and not a requirement on the producer to avoid bad files, since you
can't control all producers, as in:
If image file name and backing file resolve to the same file, the
add-cow image must be treated as invalid.
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 617 bytes --]
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH V13 1/6] docs: document for add-cow file format
2012-10-18 16:10 ` Eric Blake
@ 2012-10-19 2:14 ` Dong Xu Wang
0 siblings, 0 replies; 12+ messages in thread
From: Dong Xu Wang @ 2012-10-19 2:14 UTC (permalink / raw)
To: Eric Blake; +Cc: kwolf, qemu-devel
于 10/19/2012 12:10 AM, Eric Blake 写道:
> On 10/18/2012 03:51 AM, Dong Xu Wang wrote:
>> Document for add-cow format, the usage and spec of add-cow are introduced.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>> docs/specs/add-cow.txt | 139 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 139 insertions(+), 0 deletions(-)
>> create mode 100644 docs/specs/add-cow.txt
>>
>> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
>> new file mode 100644
>> index 0000000..dc1e107
>> --- /dev/null
>> +++ b/docs/specs/add-cow.txt
>> @@ -0,0 +1,139 @@
>> +== General ==
>> +
>> +The raw file format does not support backing files or copy on write feature.
>> +The add-cow image format makes it possible to use backing files with raw
>
> s/with raw/with a raw/
>
Okay.
>> +image by keeping a separate .add-cow metadata file. Once all sectors
>> +have been written into the raw image it is safe to discard the .add-cow
>> +and backing files, then we can use the raw image directly.
>> +
>> +An example usage of add-cow would look like::
>> +(ubuntu.img is a disk image which has been installed OS.)
>
> s/has been installed/has an installed/
Okay.
>
>> + 1) Create a raw image with the same size of ubuntu.img
>> + qemu-img create -f raw test.raw 8G
>> + 2) Create an add-cow image which will store dirty bitmap
>> + qemu-img create -f add-cow test.add-cow \
>> + -o backing_file=ubuntu.img,image_file=test.raw
>> + 3) Run qemu with add-cow image
>> + qemu -drive if=virtio,file=test.add-cow
>> +
>> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
>> +will be calculated from the size of test.raw.
>> +
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +---------------+-------------+-----------------+
>> + | Header | Reserved | COW bitmap |
>> + +---------------+-------------+-----------------+
>
> Since you call out what all 4096 bytes must be in the Header section
> (all bytes not occupied by a backing file name or format must be NUL),
> do your really need Reserved in this section, or can you just claim that
> the 4096-byte header is directly followed by the COW bitmap?
>
Okay, I think Header + COW would be enough.
>> +
>> +All numbers in add-cow are stored in Little Endian byte order.
>> +
>> +== Header ==
>> +
>> +The Header is included in the first bytes:
>> +(#define HEADER_SIZE (4096 * header_size))
>> + Byte 0 - 7: magic
>> + add-cow magic string ("ADD_COW\xff").
>> +
>> + 8 - 11: version
>> + Version number (only valid value is 1 now).
>> +
>> + 12 - 15: backing file name offset
>> + Offset in the add-cow file at which the backing file
>> + name is stored (NB: The string is not nul-terminated).
>> + If backing file name does NOT exist, this field will be
>> + 0. Must be between 80 and [HEADER_SIZE - 2](a file name
>> + must be at least 1 byte).
>> +
>> + 16 - 19: backing file name size
>> + Length of the backing file name in bytes. It will be 0
>> + if the backing file name offset is 0. If backing file
>> + name offset is non-zero, then it must be non-zero. Must
>> + be less than [HEADER_SIZE - 80] to fit in the reserved
>> + part of the header.
>
> More specifically, it must be small enough so that offset+size <=
> HEADER_SIZE.
Okay.
>
>> +
>> + 20 - 23: image file name offset
>> + Offset in the add-cow file at which the image file name
>> + is stored (NB: The string is not null terminated). It
>> + must be between 80 and [HEADER_SIZE - 2].
>> +
>> + 24 - 27: image file name size
>> + Length of the image file name in bytes.
>> + Must be less than [HEADER_SIZE - 80] to fit in the reserved
>> + part of the header.
>
> More specifically, it must be small enough so that offset+size <=
> HEADER_SIZE.
>
Okay.
>> +
>> + 28 - 31: cluster bits
>> + Number of bits that are used for addressing an offset
>> + within a cluster (1 << cluster_bits is the cluster size).
>> + Must not be less than 9 (i.e. 512 byte clusters).
>> +
>> + Note: qemu as of today has an implementation limit of 2 MB
>> + as the maximum cluster size and won't be able to open images
>> + with larger cluster sizes.
>> +
>> + 32 - 39: features
>> + Bitmask of features. An implementation can safely ignore
>> + any unknown bits that are set.
>
> Really? That sounds more like optional features, if an implementation
> can ignore a set bit. You really want to require that implementations
> reject operations on a file with a feature bit set that they don't
> recognize.
>
Yep, I should reject the file if un-recognized bits are set.
>> +
>> + Bit 0: All allocated bit. If this bit is set then
>> + backing file and COW bitmap will not be used,
>> + and can read from or write to image file directly.
>
> And this particular bit sounds like an optional feature - setting the
> bit is an optimization in speed, but leaving the bit clear or ignoring
> the bit when it is set does not change correctness.
>
Okay, but if this bit is categorized to optional feature, I will be no
features bit then...
>> +
>> + Bits 1-63: Reserved (set to 0)
>> +
>> + 40 - 47: optional features
>> + Not used now. Reserved for future use. It must be set to 0.
>> + And must be ignored while reading.
>
> s/. And must be ignored/, and ignored/
>
Okay.
>> +
>> + 48 - 51: header size
>> + The header field is variable-sized. This field indicates
>> + how many 4096 bytes will be used to store add-cow header.
>> + In add-cow v1, it is fixed to 1, so the header size will
>> + be 4096 * 1 = 4096 bytes.
>
> So is the value '1' or '4096' in this field? The wording isn't quite
> clear. But reading elsewhere, it looks like this should always be '1'
> in version one add-cow.
>
I mean '1' in this field, I will describe more clearly in future patch.
>> +
>> + 52 - 67: backing file format
>> + Format of backing file. It will be filled with 0 if
>> + backing file name offset is 0. If backing file name
>> + offset is non-empty, it must be non-empty.
>
> Are you going to enforce this? Normally, if backing file format is
> omitted, then qemu knows how to probe backing file format (with the
> caveat that it is a security risk if the probe returns non-raw but the
> backing file really was raw).
>
To avoid the security risk I add this field in header, can I do anything
to enforce this more?
>> It is coded
>> + in free-form ASCII, and is not NUL-terminated. Zero
>> + padded on the right.
>> +
>> + 68 - 83: image file format
>> + Format of image file. It must be non-empty. It is coded
>> + in free-form ASCII, and is not NUL-terminated. Zero
>> + padded on the right.
>
> Why do we need this field? Isn't the image file ALWAYS raw?
>
In v1, it only supports raw as a image_file, but in future it might
support other formats which do not support COW, so I add image file
format here.
>> +
>> + 84 - [HEADER_SIZE - 1]:
>
> Elsewhere in your spec, you use 80 rather than 84 for the starting point
> of valid offsets. Which is it?
Sorry, I add "cluster bits", which is 4 bytes, and I forgot changing 80
to 84. Will correct.
>
>> + It is used to make sure COW bitmap field starts at the
>> + HEADER_SIZE byte, backing file name and image file name
>> + will be stored here. The bytes that is not pointing to
>> + backing file and image file names must be set to 0.
>
> Not just file names, but also backing file format.
>
Okay.
>> +
>> +== COW bitmap ==
>> +
>> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
>
> Shouldn't that be HEADER_SIZE * number of header pages, since you
> dedicated a field in the header for that purpose?
>
There is one line in this doc:
(#define HEADER_SIZE (4096 * header_size))
HEADER_SIZE and header_size are not good names, will correct.
>> +backing file and image file. The bitmap will track whether the sector in
>> +backing file is dirty or not.
>> +
>> +Each bit in the bitmap tracks one cluster's status. For example, if cluster
>> +bit is 16, then each bit tracks one cluster, (1 >> 16) = 65536 bytes. The size
>
> s/>>/<</
>
Okay.
>> +of bitmap is calculated according to virtual size of image file, and it must
>> +be multiple of 65536, the bits not used will be set to 0. Within each byte,
>
> What must be a multiple of 65536, the image file, or the size of the
> bitmap in the add-cow file? I think what you want is:
>
I mean bitmap must be multiple for 65536. Will describe more clearly.
> The image size is rounded up to cluster size (where any bytes in the
> last cluster that do not fit in the image are ignored), then if the
> number of clusters is not a multiple of 8, then remaining bits in the
> bitmap will be set to 0.
>
> Or do you really want to require that the bitmap is a multiple of 64k
> bytes (at 8 bits per byte, that means the bitmap covers a multiple of
> 512k clusters, and at 512 bytes as the minimum cluster size, that the
> add-cow file format manages a minimum of 256M)? That is, are you
> requiring that the bitmap end on an aligned boundary, to make the bitmap
> easier to use without having to special case a short-read on the last
> page of the bitmap?
>
I think my description was wrong. add-cow is using block-cache.c, which
is from qcow2-cache.c. I was not to change qcow2-cache.c to a great
extent, so I use it directly, and it uses cluster_size as writing unit:
ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
s->cluster_size);
So the size of bitmap would be multiple of cluster_size of add-cow, do
you think if it is acceptable?
>> +the least significant bit covers the first cluster. Bit orders in one
>> +byte look like:
>> + +----+----+----+----+----+----+----+----+
>> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
>> + +----+----+----+----+----+----+----+----+
>> +
>> +If the bit is 0, indicates the sector has not been allocated in image file, data
>> +should be loaded from backing file while reading; if the bit is 1, indicates the
>
> s/indicates/it indicates/ (twice)
>
Okay.
> If there is no backing file, or if the image file is larger than the
> backing file and the offset is beyond the end of the backing file, then
> the data should be read as all zero bytes instead.
>
Okay.
>> +related sector has been dirty, should be loaded from image file while reading.
>> +Writing to a sector causes the corresponding bit to be set to 1.
>> +
>> +If raw image is not an even multiple of cluster bytes, bits that correspond to
>> +bytes beyond the raw file size in add-cow must be written as 0 and must be
>> +ignored when reading.
>> +
>> +Image file name and backing file name must NOT be the same, we prevent this
>> +while creating add-cow files.
>
> You prevent it when creating add-cow files via qemu-img, but that
> doesn't stop malicious users from creating a file with those properties
> and then trying to get you to parse it as add-cow. I think this needs
> to instead be a requirement on the consumer of a potentially bad file,
> and not a requirement on the producer to avoid bad files, since you
> can't control all producers, as in:
> If image file name and backing file resolve to the same file, the
> add-cow image must be treated as invalid.
Okay, I will check it while opening add-cow file.
>
Thank you Eric.
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH V13 4/6] rename qcow2-cache.c to block-cache.c
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
@ 2012-10-22 8:22 ` Stefan Hajnoczi
2012-10-22 8:24 ` Dong Xu Wang
0 siblings, 1 reply; 12+ messages in thread
From: Stefan Hajnoczi @ 2012-10-22 8:22 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: kwolf, qemu-devel
On Thu, Oct 18, 2012 at 05:51:33PM +0800, Dong Xu Wang wrote:
> diff --git a/block/qcow2.h b/block/qcow2.h
> index b4eb654..cb6fd7a 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -27,6 +27,7 @@
>
> #include "aes.h"
> #include "qemu-coroutine.h"
> +#include "block-cache.h"
>
Since block-cache.h is being included from qcow2.h you can drop the
block-cache.h includes you added to qcow2-cluster.c and
qcow2-refcount.c.
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH V13 4/6] rename qcow2-cache.c to block-cache.c
2012-10-22 8:22 ` Stefan Hajnoczi
@ 2012-10-22 8:24 ` Dong Xu Wang
0 siblings, 0 replies; 12+ messages in thread
From: Dong Xu Wang @ 2012-10-22 8:24 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kwolf, qemu-devel
On Mon, Oct 22, 2012 at 4:22 PM, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> On Thu, Oct 18, 2012 at 05:51:33PM +0800, Dong Xu Wang wrote:
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index b4eb654..cb6fd7a 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -27,6 +27,7 @@
>>
>> #include "aes.h"
>> #include "qemu-coroutine.h"
>> +#include "block-cache.h"
>>
>
> Since block-cache.h is being included from qcow2.h you can drop the
> block-cache.h includes you added to qcow2-cluster.c and
> qcow2-refcount.c.
>
Okay, thank you Stefan.
> Stefan
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [Qemu-devel] [PATCH V13 5/6] add-cow file format core code.
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 5/6] add-cow file format core code Dong Xu Wang
@ 2012-10-22 9:29 ` Stefan Hajnoczi
0 siblings, 0 replies; 12+ messages in thread
From: Stefan Hajnoczi @ 2012-10-22 9:29 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: kwolf, qemu-devel
On Thu, Oct 18, 2012 at 05:51:34PM +0800, Dong Xu Wang wrote:
> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
> +{
> + le->magic = cpu_to_le64(cpu->magic);
> + le->version = cpu_to_le32(cpu->version);
> +
> + le->backing_filename_offset = cpu_to_le32(cpu->backing_filename_offset);
> + le->backing_filename_size = cpu_to_le32(cpu->backing_filename_size);
> +
> + le->image_filename_offset = cpu_to_le32(cpu->image_filename_offset);
> + le->image_filename_size = cpu_to_le32(cpu->image_filename_size);
> +
> + le->cluster_bits = cpu_to_le32(cpu->cluster_bits);
> + le->features = cpu_to_le64(cpu->features);
> + le->optional_features = cpu_to_le64(cpu->optional_features);
> + le->header_pages_size = cpu_to_le32(cpu->header_pages_size);
> + memcpy(le->backing_fmt, cpu->backing_fmt, sizeof(cpu->backing_fmt));
> + memcpy(le->image_fmt, cpu->image_fmt, sizeof(cpu->image_fmt));
Minor style issue: sizeof(le->backing_fmt) is safer than
sizeof(cpu->image_fmt) in case the types change or this code is
copy-pasted elsewhere. Always use the size of the destination buffer.
> +}
> +
> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> + const AddCowHeader *header = (const AddCowHeader *)buf;
> +
In case .bdrv_probe() is exposed in a future stand-alone block libary
like libqblock.so where we cannot make assumptions about buf_size:
if (buf_size < sizeof(*header)) {
return 0;
}
> + ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
> + if (ret < 0) {
> + return ret;
> + }
> + snprintf(header.backing_fmt, sizeof(header.backing_fmt),
> + "%s", backing_fmt ? backing_fmt : "");
> + snprintf(header.image_fmt, sizeof(header.image_fmt),
> + "%s", image_format ? image_format : "raw");
> + add_cow_header_cpu_to_le(&header, &le_header);
> + ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
Once...
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
...twice. This can be dropped.
> +
> + if (backing_filename) {
> + ret = bdrv_pwrite(bs, header.backing_filename_offset,
> + backing_filename, header.backing_filename_size);
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> + }
> +
> + ret = bdrv_pwrite(bs, header.image_filename_offset,
> + image_filename, header.image_filename_size);
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
I suggest writing the image filename before the backing filename so it's
easier to implement .bdrv_change_backing_file() in the future.
> +
> + ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
Forgot to bdrv_close(bs) before opening as add-cow.
> + if ((s->header.features & ADD_COW_F_ALL_ALLOCATED) == 0) {
> + ret = bdrv_read_string(bs->file, sizeof(s->header),
> + sizeof(bs->backing_format) - 1,
> + bs->backing_format,
> + sizeof(bs->backing_format));
This looks wrong:
1. The header contains the backing format field, we've already read it.
Now we just need to put a NUL-terminated string into
bs->backing_format. No need for bdrv_read_string().
2. offset = sizeof(s->header) does not make sense because the
backing_format field is part of the header.
3. n = sizeof(bs->backing_format) - 1 should be the size of the header
backing_format field, not the destination buffer.
I'm wondering if I missed something or why add-cow files open
successfully in your testing, because I think this line of code would
cause it to use a junk bs->backing_format.
> + s->image_hd = bdrv_new("");
> + if (path_has_protocol(image_filename)) {
image_filename[] is uninitialized. Did you mean tmp_name?
> + pstrcpy(image_filename, sizeof(image_filename), tmp_name);
> + } else {
> + path_combine(image_filename, sizeof(image_filename),
> + bs->filename, tmp_name);
> + }
> +
> + ret = bdrv_open(s->image_hd, image_filename, flags, NULL);
What about header->image_format?
> + if (ret < 0) {
> + bdrv_delete(s->image_hd);
> + goto fail;
> + }
> +
> + bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
/ BDRV_SECTOR_SIZE
> + s->cluster_size = 1 << s->header.cluster_bits;
> + sector_per_byte = SECTORS_PER_CLUSTER * 8;
SECTORS_PER_CLUSTER does not take s->cluster_size into account.
The add_cow_open() issues should have been visible during
development/testing (backing_format, unitialized image_filename[],
unused header->image_format, SECTORS_PER_CLUSTER). It looks like not
much testing of image creation options has been done. I'll review more
of this series in the next version, please test more.
Stefan
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2012-10-22 9:29 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-18 9:51 [Qemu-devel] [PATCH V13 0/6] add-cow file format Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 1/6] docs: document for " Dong Xu Wang
2012-10-18 16:10 ` Eric Blake
2012-10-19 2:14 ` Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 2/6] make path_has_protocol non static Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
2012-10-22 8:22 ` Stefan Hajnoczi
2012-10-22 8:24 ` Dong Xu Wang
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 5/6] add-cow file format core code Dong Xu Wang
2012-10-22 9:29 ` Stefan Hajnoczi
2012-10-18 9:51 ` [Qemu-devel] [PATCH V13 6/6] qemu-iotests: add add-cow iotests support Dong Xu Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).