* [Qemu-devel] [PATCH V12 0/6] add-cow file format
@ 2012-08-10 15:39 Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
` (6 more replies)
0 siblings, 7 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
This will introduce a new file format: add-cow.
add-cow can benefit from other available functions, such as path_has_protocol and
qed_read_string, so we will make them public.
Now add-cow is still using QEMUOptionParameter, not QemuOpts, I will send a
separate patch series to convert.
snapshot_blkdev are not supported now for add-cow, after converting QEMUOptionParameter
to QemuOpts, will add related code.
v11->v12:
1) Removed un-used feature bit.
2) Share cache code with qcow2.c.
3) Remove snapshot_blkdev support, will add it in another patch.
5) COW Bitmap field in add-cow file will be multiple of 65536.
6) fix grammer and typo.
Dong Xu Wang (6):
docs: document for add cow file format
make path_has_protocol non-static
qed_read_string to bdrv_read_string
rename qcow2-cache.c to block-cache.c
add-cow file format
qemu-iotests
block.c | 29 ++-
block.h | 6 +
block/Makefile.objs | 4 +-
block/add-cow.c | 613 ++++++++++++++++++++++++++++++++++++++++++
block/add-cow.h | 85 ++++++
block/qcow2-cache.c | 323 ----------------------
block/qcow2-cluster.c | 66 +++--
block/qcow2-refcount.c | 66 +++--
block/qcow2.c | 36 ++--
block/qcow2.h | 24 +--
block/qed.c | 29 +--
block_int.h | 2 +
docs/specs/add-cow.txt | 123 +++++++++
tests/qemu-iotests/017 | 2 +-
tests/qemu-iotests/020 | 2 +-
tests/qemu-iotests/check | 4 +-
tests/qemu-iotests/common | 6 +
tests/qemu-iotests/common.rc | 19 ++
trace-events | 13 +-
19 files changed, 994 insertions(+), 458 deletions(-)
create mode 100644 block/add-cow.c
create mode 100644 block/add-cow.h
delete mode 100644 block/qcow2-cache.c
create mode 100644 docs/specs/add-cow.txt
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
2012-09-06 17:27 ` Michael Roth
2012-09-10 15:23 ` Kevin Wolf
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
` (5 subsequent siblings)
6 siblings, 2 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
Document for add-cow format, the usage and spec of add-cow are introduced.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
docs/specs/add-cow.txt | 123 ++++++++++++++++++++++++++++++++++++++++++++++++
1 files changed, 123 insertions(+), 0 deletions(-)
create mode 100644 docs/specs/add-cow.txt
diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
new file mode 100644
index 0000000..d5a7a68
--- /dev/null
+++ b/docs/specs/add-cow.txt
@@ -0,0 +1,123 @@
+== General ==
+
+The raw file format does not support backing files or copy on write feature.
+The add-cow image format makes it possible to use backing files with raw
+image by keeping a separate .add-cow metadata file. Once all sectors
+have been written into the raw image it is safe to discard the .add-cow
+and backing files, then we can use the raw image directly.
+
+An example usage of add-cow would look like::
+(ubuntu.img is a disk image which has been installed OS.)
+ 1) Create a raw image with the same size of ubuntu.img
+ qemu-img create -f raw test.raw 8G
+ 2) Create an add-cow image which will store dirty bitmap
+ qemu-img create -f add-cow test.add-cow \
+ -o backing_file=ubuntu.img,image_file=test.raw
+ 3) Run qemu with add-cow image
+ qemu -drive if=virtio,file=test.add-cow
+
+test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
+will be calculated from the size of test.raw.
+
+=Specification=
+
+The file format looks like this:
+
+ +---------------+-------------+-----------------+
+ | Header | Reserved | COW bitmap |
+ +---------------+-------------+-----------------+
+
+All numbers in add-cow are stored in Little Endian byte order.
+
+== Header ==
+
+The Header is included in the first bytes:
+(#define HEADER_SIZE (4096 * header_pages_size))
+ Byte 0 - 7: magic
+ add-cow magic string ("ADD_COW\xff").
+
+ 8 - 11: version
+ Version number (only valid value is 1 now).
+
+ 12 - 15: backing file name offset
+ Offset in the add-cow file at which the backing file
+ name is stored (NB: The string is not nul-terminated).
+ If backing file name does NOT exist, this field will be
+ 0. Must be between 80 and [HEADER_SIZE - 2](a file name
+ must be at least 1 byte).
+
+ 16 - 19: backing file name size
+ Length of the backing file name in bytes. It will be 0
+ if the backing file name offset is 0. If backing file
+ name offset is non-zero, then it must be non-zero. Must
+ be less than [HEADER_SIZE - 80] to fit in the reserved
+ part of the header.
+
+ 20 - 23: image file name offset
+ Offset in the add-cow file at which the image file name
+ is stored (NB: The string is not null terminated). It
+ must be between 80 and [HEADER_SIZE - 2].
+
+ 24 - 27: image file name size
+ Length of the image file name in bytes.
+ Must be less than [HEADER_SIZE - 80] to fit in the reserved
+ part of the header.
+
+ 28 - 35: features
+ Currently only 1 feature bit is used:
+ Feature bits:
+ * ADD_COW_F_All_ALLOCATED = 0x01.
+
+ 36 - 43: optional features
+ Not used now. Reserved for future use. It must be set to 0.
+
+ 44 - 47: header pages size
+ The header field is variable-sized. This field indicates
+ how many pages(4k) will be used to store add-cow header.
+ In add-cow v1, it is fixed to 1, so the header size will
+ be 4k * 1 = 4096 bytes.
+
+ 48 - 63: backing file format
+ format of backing file. It will be filled with 0 if
+ backing file name offset is 0. If backing file name
+ offset is non-zero, it must be non-zero. It is coded
+ in free-form ASCII, and is not NUL-terminated.
+
+ 64 - 79: image file format
+ format of image file. It must be non-zero. It is coded
+ in free-form ASCII, and is not NUL-terminated.
+
+ 80 - [HEADER_SIZE - 1]:
+ It is used to make sure COW bitmap field starts at the
+ HEADER_SIZE byte, backing file name and image file name
+ will be stored here. The bytes that is not pointing to
+ backing file and image file names will bet set to 0.
+
+== COW bitmap ==
+
+The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
+backing file and image file. The bitmap will track whether the sector in
+backing file is dirty or not.
+
+Each bit in the bitmap indicates one cluster's status. One cluster includes 128
+sectors, then each bit indicates 512 * 128 = 64k bytes. the size of bitmap is
+calculated according to virtual size of image file, and it also should be multipe
+of 65536, the bits not used will be set to 0. Within each byte, the least
+significant bit covers the first cluster. Bit orders in one byte look like:
+ +----+----+----+----+----+----+----+----+
+ | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
+ +----+----+----+----+----+----+----+----+
+
+If the bit is 0, indicates the sector has not been allocated in image file, data
+should be loaded from backing file while reading; if the bit is 1, indicates the
+related sector has been dirty, should be loaded from image file while reading.
+Writing to a sector causes the corresponding bit to be set to 1.
+
+If raw image is not an even multiple of cluster bytes, bits that correspond to
+bytes beyond the raw file size in add-cow will be 0.
+
+Image file name and backing file name must NOT be the same, we prevent this
+while creating add-cow files.
+
+Image file and backing file are interpreted relative to the qcow2 file, not
+to the current working directory of the process that opened the qcow2 file.
--
1.7.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static
2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
2012-09-06 17:27 ` Michael Roth
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
` (4 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
We will use path_has_protocol outside block.c, so just make it public.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
block.c | 2 +-
block.h | 1 +
2 files changed, 2 insertions(+), 1 deletions(-)
diff --git a/block.c b/block.c
index 24323c1..c13d803 100644
--- a/block.c
+++ b/block.c
@@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
}
/* check if the path starts with "<protocol>:" */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
{
const char *p;
diff --git a/block.h b/block.h
index 650d872..54e61c9 100644
--- a/block.h
+++ b/block.h
@@ -307,6 +307,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn);
char *get_human_readable_size(char *buf, int buf_size, int64_t size);
int path_is_absolute(const char *path);
+int path_has_protocol(const char *path);
void path_combine(char *dest, int dest_size,
const char *base_path,
const char *filename);
--
1.7.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string
2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
2012-09-06 17:32 ` Michael Roth
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
` (3 subsequent siblings)
6 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
Make qed_read_string function to a common interface, so move it to block.c.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
block.c | 27 +++++++++++++++++++++++++++
block.h | 2 ++
block/qed.c | 29 +----------------------------
3 files changed, 30 insertions(+), 28 deletions(-)
diff --git a/block.c b/block.c
index c13d803..d906b35 100644
--- a/block.c
+++ b/block.c
@@ -213,6 +213,33 @@ int path_has_protocol(const char *path)
return *p == ':';
}
+/**
+ * Read a string of known length from the image file
+ *
+ * @bs: Image file
+ * @offset: File offset to start of string, in bytes
+ * @n: String length in bytes
+ * @buf: Destination buffer
+ * @buflen: Destination buffer length in bytes
+ * @ret: 0 on success, -errno on failure
+ *
+ * The string is NUL-terminated.
+ */
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+ char *buf, size_t buflen)
+{
+ int ret;
+ if (n >= buflen) {
+ return -EINVAL;
+ }
+ ret = bdrv_pread(bs, offset, buf, n);
+ if (ret < 0) {
+ return ret;
+ }
+ buf[n] = '\0';
+ return 0;
+}
+
int path_is_absolute(const char *path)
{
#ifdef _WIN32
diff --git a/block.h b/block.h
index 54e61c9..e5dfcd7 100644
--- a/block.h
+++ b/block.h
@@ -154,6 +154,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
const void *buf, int count);
int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
int nb_sectors, QEMUIOVector *qiov);
+int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
+ char *buf, size_t buflen);
int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
diff --git a/block/qed.c b/block/qed.c
index 5f3eefa..311c589 100644
--- a/block/qed.c
+++ b/block/qed.c
@@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
}
/**
- * Read a string of known length from the image file
- *
- * @file: Image file
- * @offset: File offset to start of string, in bytes
- * @n: String length in bytes
- * @buf: Destination buffer
- * @buflen: Destination buffer length in bytes
- * @ret: 0 on success, -errno on failure
- *
- * The string is NUL-terminated.
- */
-static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
- char *buf, size_t buflen)
-{
- int ret;
- if (n >= buflen) {
- return -EINVAL;
- }
- ret = bdrv_pread(file, offset, buf, n);
- if (ret < 0) {
- return ret;
- }
- buf[n] = '\0';
- return 0;
-}
-
-/**
* Allocate new clusters
*
* @s: QED state
@@ -437,7 +410,7 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
return -EINVAL;
}
- ret = qed_read_string(bs->file, s->header.backing_filename_offset,
+ ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
s->header.backing_filename_size, bs->backing_file,
sizeof(bs->backing_file));
if (ret < 0) {
--
1.7.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
` (2 preceding siblings ...)
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
2012-09-06 17:52 ` Michael Roth
2012-09-11 8:41 ` Kevin Wolf
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
` (2 subsequent siblings)
6 siblings, 2 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
add-cow and qcow2 file format will share the same cache code, so rename
block-cache.c to block-cache.c. And related structure and qcow2 code also
are changed.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
block.h | 3 +
block/Makefile.objs | 3 +-
block/qcow2-cache.c | 323 ------------------------------------------------
block/qcow2-cluster.c | 66 ++++++----
block/qcow2-refcount.c | 66 ++++++-----
block/qcow2.c | 36 +++---
block/qcow2.h | 24 +---
trace-events | 13 +-
8 files changed, 109 insertions(+), 425 deletions(-)
delete mode 100644 block/qcow2-cache.c
diff --git a/block.h b/block.h
index e5dfcd7..c325661 100644
--- a/block.h
+++ b/block.h
@@ -401,6 +401,9 @@ typedef enum {
BLKDBG_CLUSTER_ALLOC_BYTES,
BLKDBG_CLUSTER_FREE,
+ BLKDBG_ADD_COW_UPDATE,
+ BLKDBG_ADD_COW_LOAD,
+
BLKDBG_EVENT_MAX,
} BlkDebugEvent;
diff --git a/block/Makefile.objs b/block/Makefile.objs
index b5754d3..23bdfc8 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -1,7 +1,8 @@
block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
-block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
+block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
+block-obj-y += block-cache.o
block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
block-obj-y += stream.o
block-obj-$(CONFIG_WIN32) += raw-win32.o
diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
deleted file mode 100644
index 2d4322a..0000000
--- a/block/qcow2-cache.c
+++ /dev/null
@@ -1,323 +0,0 @@
-/*
- * L2/refcount table cache for the QCOW2 format
- *
- * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
- *
- * Permission is hereby granted, free of charge, to any person obtaining a copy
- * of this software and associated documentation files (the "Software"), to deal
- * in the Software without restriction, including without limitation the rights
- * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
- * copies of the Software, and to permit persons to whom the Software is
- * furnished to do so, subject to the following conditions:
- *
- * The above copyright notice and this permission notice shall be included in
- * all copies or substantial portions of the Software.
- *
- * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
- * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
- * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
- * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
- * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
- * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
- * THE SOFTWARE.
- */
-
-#include "block_int.h"
-#include "qemu-common.h"
-#include "qcow2.h"
-#include "trace.h"
-
-typedef struct Qcow2CachedTable {
- void* table;
- int64_t offset;
- bool dirty;
- int cache_hits;
- int ref;
-} Qcow2CachedTable;
-
-struct Qcow2Cache {
- Qcow2CachedTable* entries;
- struct Qcow2Cache* depends;
- int size;
- bool depends_on_flush;
-};
-
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
-{
- BDRVQcowState *s = bs->opaque;
- Qcow2Cache *c;
- int i;
-
- c = g_malloc0(sizeof(*c));
- c->size = num_tables;
- c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
-
- for (i = 0; i < c->size; i++) {
- c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
- }
-
- return c;
-}
-
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
-{
- int i;
-
- for (i = 0; i < c->size; i++) {
- assert(c->entries[i].ref == 0);
- qemu_vfree(c->entries[i].table);
- }
-
- g_free(c->entries);
- g_free(c);
-
- return 0;
-}
-
-static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
-{
- int ret;
-
- ret = qcow2_cache_flush(bs, c->depends);
- if (ret < 0) {
- return ret;
- }
-
- c->depends = NULL;
- c->depends_on_flush = false;
-
- return 0;
-}
-
-static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
-{
- BDRVQcowState *s = bs->opaque;
- int ret = 0;
-
- if (!c->entries[i].dirty || !c->entries[i].offset) {
- return 0;
- }
-
- trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
- c == s->l2_table_cache, i);
-
- if (c->depends) {
- ret = qcow2_cache_flush_dependency(bs, c);
- } else if (c->depends_on_flush) {
- ret = bdrv_flush(bs->file);
- if (ret >= 0) {
- c->depends_on_flush = false;
- }
- }
-
- if (ret < 0) {
- return ret;
- }
-
- if (c == s->refcount_block_cache) {
- BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
- } else if (c == s->l2_table_cache) {
- BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
- }
-
- ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
- s->cluster_size);
- if (ret < 0) {
- return ret;
- }
-
- c->entries[i].dirty = false;
-
- return 0;
-}
-
-int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
-{
- BDRVQcowState *s = bs->opaque;
- int result = 0;
- int ret;
- int i;
-
- trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
-
- for (i = 0; i < c->size; i++) {
- ret = qcow2_cache_entry_flush(bs, c, i);
- if (ret < 0 && result != -ENOSPC) {
- result = ret;
- }
- }
-
- if (result == 0) {
- ret = bdrv_flush(bs->file);
- if (ret < 0) {
- result = ret;
- }
- }
-
- return result;
-}
-
-int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
- Qcow2Cache *dependency)
-{
- int ret;
-
- if (dependency->depends) {
- ret = qcow2_cache_flush_dependency(bs, dependency);
- if (ret < 0) {
- return ret;
- }
- }
-
- if (c->depends && (c->depends != dependency)) {
- ret = qcow2_cache_flush_dependency(bs, c);
- if (ret < 0) {
- return ret;
- }
- }
-
- c->depends = dependency;
- return 0;
-}
-
-void qcow2_cache_depends_on_flush(Qcow2Cache *c)
-{
- c->depends_on_flush = true;
-}
-
-static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
-{
- int i;
- int min_count = INT_MAX;
- int min_index = -1;
-
-
- for (i = 0; i < c->size; i++) {
- if (c->entries[i].ref) {
- continue;
- }
-
- if (c->entries[i].cache_hits < min_count) {
- min_index = i;
- min_count = c->entries[i].cache_hits;
- }
-
- /* Give newer hits priority */
- /* TODO Check how to optimize the replacement strategy */
- c->entries[i].cache_hits /= 2;
- }
-
- if (min_index == -1) {
- /* This can't happen in current synchronous code, but leave the check
- * here as a reminder for whoever starts using AIO with the cache */
- abort();
- }
- return min_index;
-}
-
-static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
- uint64_t offset, void **table, bool read_from_disk)
-{
- BDRVQcowState *s = bs->opaque;
- int i;
- int ret;
-
- trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
- offset, read_from_disk);
-
- /* Check if the table is already cached */
- for (i = 0; i < c->size; i++) {
- if (c->entries[i].offset == offset) {
- goto found;
- }
- }
-
- /* If not, write a table back and replace it */
- i = qcow2_cache_find_entry_to_replace(c);
- trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
- c == s->l2_table_cache, i);
- if (i < 0) {
- return i;
- }
-
- ret = qcow2_cache_entry_flush(bs, c, i);
- if (ret < 0) {
- return ret;
- }
-
- trace_qcow2_cache_get_read(qemu_coroutine_self(),
- c == s->l2_table_cache, i);
- c->entries[i].offset = 0;
- if (read_from_disk) {
- if (c == s->l2_table_cache) {
- BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
- }
-
- ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
- if (ret < 0) {
- return ret;
- }
- }
-
- /* Give the table some hits for the start so that it won't be replaced
- * immediately. The number 32 is completely arbitrary. */
- c->entries[i].cache_hits = 32;
- c->entries[i].offset = offset;
-
- /* And return the right table */
-found:
- c->entries[i].cache_hits++;
- c->entries[i].ref++;
- *table = c->entries[i].table;
-
- trace_qcow2_cache_get_done(qemu_coroutine_self(),
- c == s->l2_table_cache, i);
-
- return 0;
-}
-
-int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
- void **table)
-{
- return qcow2_cache_do_get(bs, c, offset, table, true);
-}
-
-int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
- void **table)
-{
- return qcow2_cache_do_get(bs, c, offset, table, false);
-}
-
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
-{
- int i;
-
- for (i = 0; i < c->size; i++) {
- if (c->entries[i].table == *table) {
- goto found;
- }
- }
- return -ENOENT;
-
-found:
- c->entries[i].ref--;
- *table = NULL;
-
- assert(c->entries[i].ref >= 0);
- return 0;
-}
-
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
-{
- int i;
-
- for (i = 0; i < c->size; i++) {
- if (c->entries[i].table == table) {
- goto found;
- }
- }
- abort();
-
-found:
- c->entries[i].dirty = true;
-}
diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index e179211..335dc7a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -28,6 +28,7 @@
#include "block_int.h"
#include "block/qcow2.h"
#include "trace.h"
+#include "block-cache.h"
int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
{
@@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
return new_l1_table_offset;
}
- ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+ ret = block_cache_flush(bs, s->refcount_block_cache,
+ BLOCK_TABLE_REF, s->cluster_size);
if (ret < 0) {
goto fail;
}
@@ -119,7 +121,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
BDRVQcowState *s = bs->opaque;
int ret;
- ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
+ ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
+ (void **) l2_table, BLOCK_TABLE_L2, s->cluster_size);
return ret;
}
@@ -180,7 +183,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
return l2_offset;
}
- ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+ ret = block_cache_flush(bs, s->refcount_block_cache,
+ BLOCK_TABLE_REF, s->cluster_size);
if (ret < 0) {
goto fail;
}
@@ -188,7 +192,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
/* allocate a new entry in the l2 cache */
trace_qcow2_l2_allocate_get_empty(bs, l1_index);
- ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
+ ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
+ (void **) table, BLOCK_TABLE_L2, s->cluster_size);
if (ret < 0) {
return ret;
}
@@ -203,16 +208,17 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
/* if there was an old l2 table, read it from the disk */
BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
- ret = qcow2_cache_get(bs, s->l2_table_cache,
+ ret = block_cache_get(bs, s->l2_table_cache,
old_l2_offset & L1E_OFFSET_MASK,
- (void**) &old_table);
+ (void **) &old_table, BLOCK_TABLE_L2, s->cluster_size);
if (ret < 0) {
goto fail;
}
memcpy(l2_table, old_table, s->cluster_size);
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
+ ret = block_cache_put(bs, s->l2_table_cache,
+ (void **) &old_table, BLOCK_TABLE_L2);
if (ret < 0) {
goto fail;
}
@@ -222,8 +228,9 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
trace_qcow2_l2_allocate_write_l2(bs, l1_index);
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
- ret = qcow2_cache_flush(bs, s->l2_table_cache);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ ret = block_cache_flush(bs, s->l2_table_cache,
+ BLOCK_TABLE_L2, s->cluster_size);
if (ret < 0) {
goto fail;
}
@@ -242,7 +249,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
fail:
trace_qcow2_l2_allocate_done(bs, l1_index, ret);
- qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
+ block_cache_put(bs, s->l2_table_cache, (void **) table, BLOCK_TABLE_L2);
s->l1_table[l1_index] = old_l2_offset;
return ret;
}
@@ -475,7 +482,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
abort();
}
- qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ block_cache_put(bs, s->l2_table_cache, (void **) &l2_table, BLOCK_TABLE_L2);
nb_available = (c * s->cluster_sectors);
@@ -584,13 +591,15 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
* allocated. */
cluster_offset = be64_to_cpu(l2_table[l2_index]);
if (cluster_offset & L2E_OFFSET_MASK) {
- qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table, BLOCK_TABLE_L2);
return 0;
}
cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
if (cluster_offset < 0) {
- qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table, BLOCK_TABLE_L2);
return 0;
}
@@ -605,9 +614,10 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
/* compressed clusters never have the copied flag */
BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
l2_table[l2_index] = cpu_to_be64(cluster_offset);
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table, BLOCK_TABLE_L2);
if (ret < 0) {
return 0;
}
@@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
* handled.
*/
if (cow) {
- qcow2_cache_depends_on_flush(s->l2_table_cache);
+ block_cache_depends_on_flush(s->l2_table_cache);
}
- if (qcow2_need_accurate_refcounts(s)) {
- qcow2_cache_set_dependency(bs, s->l2_table_cache,
- s->refcount_block_cache);
- }
+ block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
+ s->refcount_block_cache, s->cluster_size);
ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
if (ret < 0) {
goto err;
}
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
for (i = 0; i < m->nb_clusters; i++) {
/* if two concurrent writes happen to the same unallocated cluster
@@ -687,7 +695,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table, BLOCK_TABLE_L2);
if (ret < 0) {
goto err;
}
@@ -913,7 +922,8 @@ again:
* request to complete. If we still had the reference, we could use up the
* whole cache with sleeping requests.
*/
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table, BLOCK_TABLE_L2);
if (ret < 0) {
return ret;
}
@@ -1077,14 +1087,15 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
}
/* First remove L2 entries */
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
l2_table[l2_index + i] = cpu_to_be64(0);
/* Then decrease the refcount */
qcow2_free_any_clusters(bs, old_offset, 1);
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table, BLOCK_TABLE_L2);
if (ret < 0) {
return ret;
}
@@ -1154,7 +1165,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
old_offset = be64_to_cpu(l2_table[l2_index + i]);
/* Update L2 entries */
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
if (old_offset & QCOW_OFLAG_COMPRESSED) {
l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
qcow2_free_any_clusters(bs, old_offset, 1);
@@ -1163,7 +1174,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
}
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table, BLOCK_TABLE_L2);
if (ret < 0) {
return ret;
}
diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 5e3f915..728bfc1 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -25,6 +25,7 @@
#include "qemu-common.h"
#include "block_int.h"
#include "block/qcow2.h"
+#include "block-cache.h"
static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
@@ -71,8 +72,8 @@ static int load_refcount_block(BlockDriverState *bs,
int ret;
BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
- ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
- refcount_block);
+ ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
+ refcount_block, BLOCK_TABLE_REF, s->cluster_size);
return ret;
}
@@ -98,8 +99,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
if (!refcount_block_offset)
return 0;
- ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
- (void**) &refcount_block);
+ ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
+ (void **) &refcount_block, BLOCK_TABLE_REF, s->cluster_size);
if (ret < 0) {
return ret;
}
@@ -108,8 +109,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
refcount = be16_to_cpu(refcount_block[block_index]);
- ret = qcow2_cache_put(bs, s->refcount_block_cache,
- (void**) &refcount_block);
+ ret = block_cache_put(bs, s->refcount_block_cache,
+ (void **) &refcount_block, BLOCK_TABLE_REF);
if (ret < 0) {
return ret;
}
@@ -201,7 +202,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
*refcount_block = NULL;
/* We write to the refcount table, so we might depend on L2 tables */
- qcow2_cache_flush(bs, s->l2_table_cache);
+ block_cache_flush(bs, s->l2_table_cache,
+ BLOCK_TABLE_L2, s->cluster_size);
/* Allocate the refcount block itself and mark it as used */
int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
@@ -217,8 +219,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
/* Zero the new refcount block before updating it */
- ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
- (void**) refcount_block);
+ ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
+ (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
if (ret < 0) {
goto fail_block;
}
@@ -241,8 +243,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
/* Initialize the new refcount block only after updating its refcount,
* update_refcount uses the refcount cache itself */
- ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
- (void**) refcount_block);
+ ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
+ (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
if (ret < 0) {
goto fail_block;
}
@@ -252,8 +254,9 @@ static int alloc_refcount_block(BlockDriverState *bs,
/* Now the new refcount block needs to be written to disk */
BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
- qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
- ret = qcow2_cache_flush(bs, s->refcount_block_cache);
+ block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
+ ret = block_cache_flush(bs, s->refcount_block_cache,
+ BLOCK_TABLE_REF, s->cluster_size);
if (ret < 0) {
goto fail_block;
}
@@ -273,7 +276,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
return 0;
}
- ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+ ret = block_cache_put(bs, s->refcount_block_cache,
+ (void **) refcount_block, BLOCK_TABLE_REF);
if (ret < 0) {
goto fail_block;
}
@@ -406,7 +410,8 @@ fail_table:
g_free(new_table);
fail_block:
if (*refcount_block != NULL) {
- qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
+ block_cache_put(bs, s->refcount_block_cache,
+ (void **) refcount_block, BLOCK_TABLE_REF);
}
return ret;
}
@@ -432,8 +437,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
}
if (addend < 0) {
- qcow2_cache_set_dependency(bs, s->refcount_block_cache,
- s->l2_table_cache);
+ block_cache_set_dependency(bs, s->refcount_block_cache, BLOCK_TABLE_REF,
+ s->l2_table_cache, s->cluster_size);
}
start = offset & ~(s->cluster_size - 1);
@@ -449,8 +454,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
/* Load the refcount block and allocate it if needed */
if (table_index != old_table_index) {
if (refcount_block) {
- ret = qcow2_cache_put(bs, s->refcount_block_cache,
- (void**) &refcount_block);
+ ret = block_cache_put(bs, s->refcount_block_cache,
+ (void **) &refcount_block, BLOCK_TABLE_REF);
if (ret < 0) {
goto fail;
}
@@ -463,7 +468,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
}
old_table_index = table_index;
- qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
+ block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
/* we can update the count and save it */
block_index = cluster_index &
@@ -486,8 +491,8 @@ fail:
/* Write last changed block to disk */
if (refcount_block) {
int wret;
- wret = qcow2_cache_put(bs, s->refcount_block_cache,
- (void**) &refcount_block);
+ wret = block_cache_put(bs, s->refcount_block_cache,
+ (void **) &refcount_block, BLOCK_TABLE_REF);
if (wret < 0) {
return ret < 0 ? ret : wret;
}
@@ -763,8 +768,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
old_l2_offset = l2_offset;
l2_offset &= L1E_OFFSET_MASK;
- ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
- (void**) &l2_table);
+ ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
+ (void **) &l2_table, BLOCK_TABLE_L2, s->cluster_size);
if (ret < 0) {
goto fail;
}
@@ -811,16 +816,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
}
if (offset != old_offset) {
if (addend > 0) {
- qcow2_cache_set_dependency(bs, s->l2_table_cache,
- s->refcount_block_cache);
+ block_cache_set_dependency(bs, s->l2_table_cache,
+ BLOCK_TABLE_L2, s->refcount_block_cache,
+ s->cluster_size);
}
l2_table[j] = cpu_to_be64(offset);
- qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
+ block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
}
}
}
- ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ ret = block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table, BLOCK_TABLE_L2);
if (ret < 0) {
goto fail;
}
@@ -847,7 +854,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
ret = 0;
fail:
if (l2_table) {
- qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
+ block_cache_put(bs, s->l2_table_cache,
+ (void **) &l2_table, BLOCK_TABLE_L2);
}
/* Update L1 only if it isn't deleted anyway (addend = -1) */
diff --git a/block/qcow2.c b/block/qcow2.c
index fd5e214..b89d312 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -30,6 +30,7 @@
#include "qemu-error.h"
#include "qerror.h"
#include "trace.h"
+#include "block-cache.h"
/*
Differences with QCOW:
@@ -415,8 +416,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
}
/* alloc L2 table/refcount block cache */
- s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
- s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
+ s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
+ s->refcount_block_cache =
+ block_cache_create(bs, REFCOUNT_CACHE_SIZE, s->cluster_size);
s->cluster_cache = g_malloc(s->cluster_size);
/* one more sector for decompressed data alignment */
@@ -500,7 +502,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
qcow2_refcount_close(bs);
g_free(s->l1_table);
if (s->l2_table_cache) {
- qcow2_cache_destroy(bs, s->l2_table_cache);
+ block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
}
g_free(s->cluster_cache);
qemu_vfree(s->cluster_data);
@@ -860,13 +862,13 @@ static void qcow2_close(BlockDriverState *bs)
BDRVQcowState *s = bs->opaque;
g_free(s->l1_table);
- qcow2_cache_flush(bs, s->l2_table_cache);
- qcow2_cache_flush(bs, s->refcount_block_cache);
-
+ block_cache_flush(bs, s->l2_table_cache,
+ BLOCK_TABLE_L2, s->cluster_size);
+ block_cache_flush(bs, s->refcount_block_cache,
+ BLOCK_TABLE_REF, s->cluster_size);
qcow2_mark_clean(bs);
-
- qcow2_cache_destroy(bs, s->l2_table_cache);
- qcow2_cache_destroy(bs, s->refcount_block_cache);
+ block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
+ block_cache_destroy(bs, s->refcount_block_cache, BLOCK_TABLE_REF);
g_free(s->unknown_header_fields);
cleanup_unknown_header_ext(bs);
@@ -1339,8 +1341,6 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
options->value.s);
return -EINVAL;
}
- } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
- flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
}
options++;
}
@@ -1537,18 +1537,18 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
int ret;
qemu_co_mutex_lock(&s->lock);
- ret = qcow2_cache_flush(bs, s->l2_table_cache);
+ ret = block_cache_flush(bs, s->l2_table_cache,
+ BLOCK_TABLE_L2, s->cluster_size);
if (ret < 0) {
qemu_co_mutex_unlock(&s->lock);
return ret;
}
- if (qcow2_need_accurate_refcounts(s)) {
- ret = qcow2_cache_flush(bs, s->refcount_block_cache);
- if (ret < 0) {
- qemu_co_mutex_unlock(&s->lock);
- return ret;
- }
+ ret = block_cache_flush(bs, s->refcount_block_cache,
+ BLOCK_TABLE_REF, s->cluster_size);
+ if (ret < 0) {
+ qemu_co_mutex_unlock(&s->lock);
+ return ret;
}
qemu_co_mutex_unlock(&s->lock);
diff --git a/block/qcow2.h b/block/qcow2.h
index b4eb654..cb6fd7a 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -27,6 +27,7 @@
#include "aes.h"
#include "qemu-coroutine.h"
+#include "block-cache.h"
//#define DEBUG_ALLOC
//#define DEBUG_ALLOC2
@@ -94,8 +95,6 @@ typedef struct QCowSnapshot {
uint64_t vm_clock_nsec;
} QCowSnapshot;
-struct Qcow2Cache;
-typedef struct Qcow2Cache Qcow2Cache;
typedef struct Qcow2UnknownHeaderExtension {
uint32_t magic;
@@ -146,8 +145,8 @@ typedef struct BDRVQcowState {
uint64_t l1_table_offset;
uint64_t *l1_table;
- Qcow2Cache* l2_table_cache;
- Qcow2Cache* refcount_block_cache;
+ BlockCache *l2_table_cache;
+ BlockCache *refcount_block_cache;
uint8_t *cluster_cache;
uint8_t *cluster_data;
@@ -316,21 +315,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
void qcow2_free_snapshots(BlockDriverState *bs);
int qcow2_read_snapshots(BlockDriverState *bs);
-
-/* qcow2-cache.c functions */
-Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
-int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
-
-void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
-int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
-int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
- Qcow2Cache *dependency);
-void qcow2_cache_depends_on_flush(Qcow2Cache *c);
-
-int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
- void **table);
-int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
- void **table);
-int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
-
#endif
diff --git a/trace-events b/trace-events
index 6b12f83..52b6438 100644
--- a/trace-events
+++ b/trace-events
@@ -439,12 +439,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
-qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
-qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
-qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
-qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+# block/block-cache.c
+block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
+block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
+block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
+block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
# block/qed-l2-cache.c
qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
--
1.7.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH V12 5/6] add-cow file format
2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
` (3 preceding siblings ...)
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
2012-09-06 20:19 ` Michael Roth
2012-09-11 9:40 ` Kevin Wolf
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
2012-08-23 5:34 ` [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
6 siblings, 2 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
add-cow file format core code. It use block-cache.c as cache code.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
block/Makefile.objs | 1 +
block/add-cow.c | 613 +++++++++++++++++++++++++++++++++++++++++++++++++++
block/add-cow.h | 85 +++++++
block_int.h | 2 +
4 files changed, 701 insertions(+), 0 deletions(-)
create mode 100644 block/add-cow.c
create mode 100644 block/add-cow.h
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 23bdfc8..7ed5051 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-y += qed-check.o
+block-obj-y += add-cow.o
block-obj-y += block-cache.o
block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
block-obj-y += stream.o
diff --git a/block/add-cow.c b/block/add-cow.c
new file mode 100644
index 0000000..d4711d5
--- /dev/null
+++ b/block/add-cow.c
@@ -0,0 +1,613 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "block_int.h"
+#include "module.h"
+#include "add-cow.h"
+
+static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
+{
+ cpu->magic = le64_to_cpu(le->magic);
+ cpu->version = le32_to_cpu(le->version);
+
+ cpu->backing_filename_offset = le32_to_cpu(le->backing_filename_offset);
+ cpu->backing_filename_size = le32_to_cpu(le->backing_filename_size);
+
+ cpu->image_filename_offset = le32_to_cpu(le->image_filename_offset);
+ cpu->image_filename_size = le32_to_cpu(le->image_filename_size);
+
+ cpu->features = le64_to_cpu(le->features);
+ cpu->optional_features = le64_to_cpu(le->optional_features);
+ cpu->header_pages_size = le32_to_cpu(le->header_pages_size);
+}
+
+static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
+{
+ le->magic = cpu_to_le64(cpu->magic);
+ le->version = cpu_to_le32(cpu->version);
+
+ le->backing_filename_offset = cpu_to_le32(cpu->backing_filename_offset);
+ le->backing_filename_size = cpu_to_le32(cpu->backing_filename_size);
+
+ le->image_filename_offset = cpu_to_le32(cpu->image_filename_offset);
+ le->image_filename_size = cpu_to_le32(cpu->image_filename_size);
+
+ le->features = cpu_to_le64(cpu->features);
+ le->optional_features = cpu_to_le64(cpu->optional_features);
+ le->header_pages_size = cpu_to_le32(cpu->header_pages_size);
+}
+
+static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
+{
+ const AddCowHeader *header = (const AddCowHeader *)buf;
+
+ if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
+ le32_to_cpu(header->version) == ADD_COW_VERSION) {
+ return 100;
+ } else {
+ return 0;
+ }
+}
+
+static int add_cow_create(const char *filename, QEMUOptionParameter *options)
+{
+ AddCowHeader header = {
+ .magic = ADD_COW_MAGIC,
+ .version = ADD_COW_VERSION,
+ .features = 0,
+ .optional_features = 0,
+ .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
+ };
+ AddCowHeader le_header;
+ int64_t image_len = 0;
+ const char *backing_filename = NULL;
+ const char *backing_fmt = NULL;
+ const char *image_filename = NULL;
+ const char *image_format = NULL;
+ BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
+ BlockDriver *drv = bdrv_find_format("add-cow");
+ BDRVAddCowState s;
+ int ret;
+
+ while (options && options->name) {
+ if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
+ image_len = options->value.n;
+ } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
+ backing_filename = options->value.s;
+ } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
+ backing_fmt = options->value.s;
+ } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
+ image_filename = options->value.s;
+ } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
+ image_format = options->value.s;
+ }
+ options++;
+ }
+
+ if (backing_filename) {
+ header.backing_filename_offset = sizeof(header)
+ + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
+ header.backing_filename_size = strlen(backing_filename);
+
+ if (!backing_fmt) {
+ backing_bs = bdrv_new("image");
+ ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
+ | BDRV_O_CACHE_WB, NULL);
+ if (ret < 0) {
+ return ret;
+ }
+ backing_fmt = bdrv_get_format_name(backing_bs);
+ bdrv_delete(backing_bs);
+ }
+ } else {
+ header.features |= ADD_COW_F_All_ALLOCATED;
+ }
+
+ if (image_filename) {
+ header.image_filename_offset =
+ sizeof(header) + sizeof(s.backing_file_format)
+ + sizeof(s.image_file_format) + header.backing_filename_size;
+ header.image_filename_size = strlen(image_filename);
+ } else {
+ error_report("Error: image_file should be given.");
+ return -EINVAL;
+ }
+
+ if (backing_filename && !strcmp(backing_filename, image_filename)) {
+ error_report("Error: Trying to create an image with the "
+ "same backing file name as the image file name");
+ return -EINVAL;
+ }
+
+ if (!strcmp(filename, image_filename)) {
+ error_report("Error: Trying to create an image with the "
+ "same filename as the image file name");
+ return -EINVAL;
+ }
+
+ if (header.image_filename_offset + header.image_filename_size
+ > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
+ error_report("image_file name or backing_file name too long.");
+ return -ENOSPC;
+ }
+
+ ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
+ if (ret < 0) {
+ return ret;
+ }
+ bdrv_delete(image_bs);
+
+ ret = bdrv_create_file(filename, NULL);
+ if (ret < 0) {
+ return ret;
+ }
+
+ ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
+ if (ret < 0) {
+ return ret;
+ }
+ add_cow_header_cpu_to_le(&header, &le_header);
+ ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+
+ ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
+ backing_fmt ? strlen(backing_fmt) : 0);
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+
+ ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
+ image_format ? image_format : "raw",
+ image_format ? strlen(image_format) : sizeof("raw"));
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+
+ if (backing_filename) {
+ ret = bdrv_pwrite(bs, header.backing_filename_offset,
+ backing_filename, header.backing_filename_size);
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+ }
+
+ ret = bdrv_pwrite(bs, header.image_filename_offset,
+ image_filename, header.image_filename_size);
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+
+ ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
+ if (ret < 0) {
+ bdrv_delete(bs);
+ return ret;
+ }
+
+ ret = bdrv_truncate(bs, image_len);
+ bdrv_delete(bs);
+ return ret;
+}
+
+static int add_cow_open(BlockDriverState *bs, int flags)
+{
+ char image_filename[ADD_COW_FILE_LEN];
+ char tmp_name[ADD_COW_FILE_LEN];
+ BlockDriver *image_drv = NULL;
+ int ret;
+ int sector_per_byte;
+ BDRVAddCowState *s = bs->opaque;
+ AddCowHeader le_header;
+
+ ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
+ if (ret != sizeof(s->header)) {
+ goto fail;
+ }
+
+ add_cow_header_le_to_cpu(&le_header, &s->header);
+
+ if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
+ ret = -EINVAL;
+ goto fail;
+ }
+
+ if (s->header.version != ADD_COW_VERSION) {
+ char version[64];
+ snprintf(version, sizeof(version), "ADD-COW version %d",
+ s->header.version);
+ qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+ bs->device_name, "add-cow", version);
+ ret = -ENOTSUP;
+ goto fail;
+ }
+
+ if (s->header.features & ~ADD_COW_FEATURE_MASK) {
+ char buf[64];
+ snprintf(buf, sizeof(buf), "%" PRIx64,
+ s->header.features & ~ADD_COW_FEATURE_MASK);
+ qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+ bs->device_name, "add-cow", buf);
+ return -ENOTSUP;
+ }
+
+ if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
+ ret = bdrv_read_string(bs->file, sizeof(s->header),
+ sizeof(s->backing_file_format) - 1, s->backing_file_format,
+ sizeof(s->backing_file_format));
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
+ ret = bdrv_read_string(bs->file,
+ sizeof(s->header) + sizeof(s->image_file_format),
+ sizeof(s->image_file_format) - 1, s->image_file_format,
+ sizeof(s->image_file_format));
+ if (ret < 0) {
+ goto fail;
+ }
+
+ if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
+ ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
+ s->header.backing_filename_size, bs->backing_file,
+ sizeof(bs->backing_file));
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
+ ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
+ s->header.image_filename_size, tmp_name,
+ sizeof(tmp_name));
+ if (ret < 0) {
+ goto fail;
+ }
+
+ s->image_hd = bdrv_new("");
+ if (path_has_protocol(image_filename)) {
+ pstrcpy(image_filename, sizeof(image_filename), tmp_name);
+ } else {
+ path_combine(image_filename, sizeof(image_filename),
+ bs->filename, tmp_name);
+ }
+
+ ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
+ if (ret < 0) {
+ bdrv_delete(s->image_hd);
+ goto fail;
+ }
+
+ bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
+ s->cluster_size = ADD_COW_CLUSTER_SIZE;
+ sector_per_byte = SECTORS_PER_CLUSTER * 8;
+ s->bitmap_size =
+ (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
+ s->bitmap_cache =
+ block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
+
+ qemu_co_mutex_init(&s->lock);
+ return 0;
+fail:
+ if (s->bitmap_cache) {
+ block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
+ }
+ return ret;
+}
+
+static void add_cow_close(BlockDriverState *bs)
+{
+ BDRVAddCowState *s = bs->opaque;
+ block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
+ bdrv_delete(s->image_hd);
+}
+
+static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
+{
+ BDRVAddCowState *s = bs->opaque;
+ BlockCache *c = s->bitmap_cache;
+ int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
+ uint8_t *table = NULL;
+ uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
+ + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
+ int ret = block_cache_get(bs, s->bitmap_cache, offset,
+ (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
+
+ if (ret < 0) {
+ return ret;
+ }
+ return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
+ & (1 << (cluster_num % 8));
+}
+
+static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors, int *num_same)
+{
+ BDRVAddCowState *s = bs->opaque;
+ int changed;
+
+ if (nb_sectors == 0) {
+ *num_same = 0;
+ return 0;
+ }
+
+ if (s->header.features & ADD_COW_F_All_ALLOCATED) {
+ *num_same = nb_sectors - 1;
+ return 1;
+ }
+ changed = is_allocated(bs, sector_num);
+
+ for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
+ if (is_allocated(bs, sector_num + *num_same) != changed) {
+ break;
+ }
+ }
+ return changed;
+}
+
+static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
+ int64_t sector_num, int nb_sectors)
+{
+ int n1;
+ if ((sector_num + nb_sectors) <= bs->total_sectors) {
+ return nb_sectors;
+ }
+ if (sector_num >= bs->total_sectors) {
+ n1 = 0;
+ } else {
+ n1 = bs->total_sectors - sector_num;
+ }
+
+ qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
+ 0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
+
+ return n1;
+}
+
+static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
+ int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
+{
+ BDRVAddCowState *s = bs->opaque;
+ int cur_nr_sectors;
+ uint64_t bytes_done = 0;
+ QEMUIOVector hd_qiov;
+ int n, n1, ret = 0;
+
+ qemu_iovec_init(&hd_qiov, qiov->niov);
+ qemu_co_mutex_lock(&s->lock);
+ while (remaining_sectors != 0) {
+ cur_nr_sectors = remaining_sectors;
+ if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
+ cur_nr_sectors = n;
+ qemu_iovec_reset(&hd_qiov);
+ qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
+ cur_nr_sectors * BDRV_SECTOR_SIZE);
+ qemu_co_mutex_unlock(&s->lock);
+ ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
+ qemu_co_mutex_lock(&s->lock);
+ if (ret < 0) {
+ goto fail;
+ }
+ } else {
+ cur_nr_sectors = n;
+ if (bs->backing_hd) {
+ qemu_iovec_reset(&hd_qiov);
+ qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
+ cur_nr_sectors * BDRV_SECTOR_SIZE);
+ n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
+ sector_num, cur_nr_sectors);
+ if (n1 > 0) {
+ qemu_co_mutex_unlock(&s->lock);
+ ret = bdrv_co_readv(bs->backing_hd, sector_num,
+ n, &hd_qiov);
+ qemu_co_mutex_lock(&s->lock);
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+ } else {
+ qemu_iovec_memset(&hd_qiov, 0, 0,
+ BDRV_SECTOR_SIZE * cur_nr_sectors);
+ }
+ }
+ remaining_sectors -= cur_nr_sectors;
+ sector_num += cur_nr_sectors;
+ bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
+ }
+fail:
+ qemu_co_mutex_unlock(&s->lock);
+ qemu_iovec_destroy(&hd_qiov);
+ return ret;
+}
+
+static int coroutine_fn copy_sectors(BlockDriverState *bs,
+ int n_start, int n_end)
+{
+ BDRVAddCowState *s = bs->opaque;
+ QEMUIOVector qiov;
+ struct iovec iov;
+ int n, ret;
+
+ n = n_end - n_start;
+ if (n <= 0) {
+ return 0;
+ }
+
+ iov.iov_len = n * BDRV_SECTOR_SIZE;
+ iov.iov_base = qemu_blockalign(bs, iov.iov_len);
+
+ qemu_iovec_init_external(&qiov, &iov, 1);
+
+ ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
+ if (ret < 0) {
+ goto out;
+ }
+ ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
+ if (ret < 0) {
+ goto out;
+ }
+
+ ret = 0;
+out:
+ qemu_vfree(iov.iov_base);
+ return ret;
+}
+
+static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
+ int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
+{
+ BDRVAddCowState *s = bs->opaque;
+ BlockCache *c = s->bitmap_cache;
+ int ret = 0, i;
+ QEMUIOVector hd_qiov;
+ uint8_t *table;
+ uint64_t offset;
+
+ qemu_co_mutex_lock(&s->lock);
+ qemu_iovec_init(&hd_qiov, qiov->niov);
+ ret = bdrv_co_writev(s->image_hd,
+ sector_num,
+ remaining_sectors, qiov);
+
+ if (ret < 0) {
+ goto fail;
+ }
+ if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
+ /* Copy content of unmodified sectors */
+ if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
+ ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
+ sector_num);
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
+ if (!is_cluster_tail(sector_num + remaining_sectors - 1)
+ && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
+ ret = copy_sectors(bs, sector_num + remaining_sectors,
+ ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
+ if (ret < 0) {
+ goto fail;
+ }
+ }
+
+ for (i = sector_num / SECTORS_PER_CLUSTER;
+ i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
+ i++) {
+ offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
+ + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
+ ret = block_cache_get(bs, s->bitmap_cache, offset,
+ (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
+ if (ret < 0) {
+ goto fail;
+ }
+ if ((table[i / 8] & (1 << (i % 8))) == 0) {
+ table[i / 8] |= (1 << (i % 8));
+ block_cache_entry_mark_dirty(s->bitmap_cache, table);
+ }
+ }
+ }
+ ret = 0;
+fail:
+ qemu_co_mutex_unlock(&s->lock);
+ qemu_iovec_destroy(&hd_qiov);
+ return ret;
+}
+
+static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
+{
+ BDRVAddCowState *s = bs->opaque;
+ int sector_per_byte = SECTORS_PER_CLUSTER * 8;
+ int ret;
+ uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
+ int64_t bitmap_size =
+ (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
+ bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
+ & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
+
+ ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
+ if (ret < 0) {
+ return ret;
+ }
+ return 0;
+}
+
+static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
+{
+ BDRVAddCowState *s = bs->opaque;
+ int ret;
+
+ qemu_co_mutex_lock(&s->lock);
+ ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
+ ADD_COW_CACHE_ENTRY_SIZE);
+ qemu_co_mutex_unlock(&s->lock);
+ return ret;
+}
+
+static QEMUOptionParameter add_cow_create_options[] = {
+ {
+ .name = BLOCK_OPT_SIZE,
+ .type = OPT_SIZE,
+ .help = "Virtual disk size"
+ },
+ {
+ .name = BLOCK_OPT_BACKING_FILE,
+ .type = OPT_STRING,
+ .help = "File name of a base image"
+ },
+ {
+ .name = BLOCK_OPT_BACKING_FMT,
+ .type = OPT_STRING,
+ .help = "Image format of the base image"
+ },
+ {
+ .name = BLOCK_OPT_IMAGE_FILE,
+ .type = OPT_STRING,
+ .help = "File name of a image file"
+ },
+ {
+ .name = BLOCK_OPT_IMAGE_FORMAT,
+ .type = OPT_STRING,
+ .help = "Image format of the image file"
+ },
+ { NULL }
+};
+
+static BlockDriver bdrv_add_cow = {
+ .format_name = "add-cow",
+ .instance_size = sizeof(BDRVAddCowState),
+ .bdrv_probe = add_cow_probe,
+ .bdrv_open = add_cow_open,
+ .bdrv_close = add_cow_close,
+ .bdrv_create = add_cow_create,
+ .bdrv_co_readv = add_cow_co_readv,
+ .bdrv_co_writev = add_cow_co_writev,
+ .bdrv_truncate = bdrv_add_cow_truncate,
+ .bdrv_co_is_allocated = add_cow_is_allocated,
+
+ .create_options = add_cow_create_options,
+ .bdrv_co_flush_to_os = add_cow_co_flush,
+};
+
+static void bdrv_add_cow_init(void)
+{
+ bdrv_register(&bdrv_add_cow);
+}
+
+block_init(bdrv_add_cow_init);
diff --git a/block/add-cow.h b/block/add-cow.h
new file mode 100644
index 0000000..f058376
--- /dev/null
+++ b/block/add-cow.h
@@ -0,0 +1,85 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef BLOCK_ADD_COW_H
+#define BLOCK_ADD_COW_H
+#include "block-cache.h"
+
+enum {
+ ADD_COW_F_All_ALLOCATED = 0X01,
+ ADD_COW_FEATURE_MASK = ADD_COW_F_All_ALLOCATED,
+
+ ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
+ ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
+ ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
+ ((uint64_t)'W' << 8) | 0xFF),
+ ADD_COW_VERSION = 1,
+ ADD_COW_FILE_LEN = 1024,
+ ADD_COW_CACHE_SIZE = 16,
+ ADD_COW_CACHE_ENTRY_SIZE = 65536,
+ ADD_COW_CLUSTER_SIZE = 65536,
+ SECTORS_PER_CLUSTER = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
+ ADD_COW_PAGE_SIZE = 4096,
+ ADD_COW_DEFAULT_PAGE_SIZE = 1,
+};
+
+typedef struct AddCowHeader {
+ uint64_t magic;
+ uint32_t version;
+
+ uint32_t backing_filename_offset;
+ uint32_t backing_filename_size;
+
+ uint32_t image_filename_offset;
+ uint32_t image_filename_size;
+
+ uint64_t features;
+ uint64_t optional_features;
+ uint32_t header_pages_size;
+} QEMU_PACKED AddCowHeader;
+
+typedef struct BDRVAddCowState {
+ BlockDriverState *image_hd;
+ CoMutex lock;
+ int cluster_size;
+ BlockCache *bitmap_cache;
+ uint64_t bitmap_size;
+ AddCowHeader header;
+ char backing_file_format[16];
+ char image_file_format[16];
+} BDRVAddCowState;
+
+/* Convert sector_num to offset in bitmap */
+static inline int64_t offset_in_bitmap(int64_t sector_num)
+{
+ int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
+ return cluster_num / 8;
+}
+
+static inline bool is_cluster_head(int64_t sector_num)
+{
+ return sector_num % SECTORS_PER_CLUSTER == 0;
+}
+
+static inline bool is_cluster_tail(int64_t sector_num)
+{
+ return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
+}
+
+BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
+int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
+void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
+int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
+ void **table);
+int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
+#endif
diff --git a/block_int.h b/block_int.h
index 6c1d9ca..67954ec 100644
--- a/block_int.h
+++ b/block_int.h
@@ -53,6 +53,8 @@
#define BLOCK_OPT_SUBFMT "subformat"
#define BLOCK_OPT_COMPAT_LEVEL "compat"
#define BLOCK_OPT_LAZY_REFCOUNTS "lazy_refcounts"
+#define BLOCK_OPT_IMAGE_FILE "image_file"
+#define BLOCK_OPT_IMAGE_FORMAT "image_format"
typedef struct BdrvTrackedRequest BdrvTrackedRequest;
--
1.7.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support
2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
` (4 preceding siblings ...)
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
@ 2012-08-10 15:39 ` Dong Xu Wang
2012-09-11 9:55 ` Kevin Wolf
2012-08-23 5:34 ` [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
6 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-10 15:39 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
Add qemu-iotests support for add-cow.
Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
---
tests/qemu-iotests/017 | 2 +-
tests/qemu-iotests/020 | 2 +-
tests/qemu-iotests/check | 4 ++--
tests/qemu-iotests/common | 6 ++++++
tests/qemu-iotests/common.rc | 19 +++++++++++++++++++
5 files changed, 29 insertions(+), 4 deletions(-)
diff --git a/tests/qemu-iotests/017 b/tests/qemu-iotests/017
index 66951eb..d31432f 100755
--- a/tests/qemu-iotests/017
+++ b/tests/qemu-iotests/017
@@ -40,7 +40,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
. ./common.pattern
# Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
_supported_proto generic
_supported_os Linux
diff --git a/tests/qemu-iotests/020 b/tests/qemu-iotests/020
index 2fb0ff8..3dbb495 100755
--- a/tests/qemu-iotests/020
+++ b/tests/qemu-iotests/020
@@ -42,7 +42,7 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
. ./common.pattern
# Any format supporting backing files
-_supported_fmt qcow qcow2 vmdk qed
+_supported_fmt qcow qcow2 vmdk qed add-cow
_supported_proto generic
_supported_os Linux
diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
index 432732c..122267b 100755
--- a/tests/qemu-iotests/check
+++ b/tests/qemu-iotests/check
@@ -243,7 +243,7 @@ do
echo " - no qualified output"
err=true
else
- if diff -w $seq.out $tmp.out >/dev/null 2>&1
+ if diff -w -I "^Formatting" $seq.out $tmp.out >/dev/null 2>&1
then
echo ""
if $err
@@ -255,7 +255,7 @@ do
else
echo " - output mismatch (see $seq.out.bad)"
mv $tmp.out $seq.out.bad
- $diff -w $seq.out $seq.out.bad
+ $diff -w -I "^Formatting" $seq.out $seq.out.bad
err=true
fi
fi
diff --git a/tests/qemu-iotests/common b/tests/qemu-iotests/common
index 1f6fdf5..1c81b09 100644
--- a/tests/qemu-iotests/common
+++ b/tests/qemu-iotests/common
@@ -128,6 +128,7 @@ common options
check options
-raw test raw (default)
-cow test cow
+ -add-cow test add-cow
-qcow test qcow
-qcow2 test qcow2
-qed test qed
@@ -163,6 +164,11 @@ testlist options
xpand=false
;;
+ -add-cow)
+ IMGFMT=add-cow
+ xpand=false
+ ;;
+
-qcow)
IMGFMT=qcow
xpand=false
diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
index 7782808..ec5afd7 100644
--- a/tests/qemu-iotests/common.rc
+++ b/tests/qemu-iotests/common.rc
@@ -97,6 +97,18 @@ _make_test_img()
fi
if [ \( "$IMGFMT" = "qcow2" -o "$IMGFMT" = "qed" \) -a -n "$CLUSTER_SIZE" ]; then
optstr=$(_optstr_add "$optstr" "cluster_size=$CLUSTER_SIZE")
+ elif [ "$IMGFMT" = "add-cow" ]; then
+ local BACKING="$TEST_IMG"".qcow2"
+ local IMG="$TEST_IMG"".raw"
+ if [ "$1" = "-b" ]; then
+ IMG="$IMG"".b"
+ $QEMU_IMG create -f raw $IMG $image_size>/dev/null
+ extra_img_options="-o image_file=$IMG $extra_img_options"
+ else
+ $QEMU_IMG create -f raw $IMG $image_size>/dev/null
+ $QEMU_IMG create -f qcow2 $BACKING $image_size>/dev/null
+ extra_img_options="-o backing_file=$BACKING,image_file=$IMG"
+ fi
fi
if [ -n "$optstr" ]; then
@@ -125,6 +137,13 @@ _cleanup_test_img()
rm -f $TEST_DIR/t.$IMGFMT
rm -f $TEST_DIR/t.$IMGFMT.orig
rm -f $TEST_DIR/t.$IMGFMT.base
+ if [ "$IMGFMT" = "add-cow" ]; then
+ rm -f $TEST_DIR/t.$IMGFMT.qcow2
+ rm -f $TEST_DIR/t.$IMGFMT.raw
+ rm -f $TEST_DIR/t.$IMGFMT.raw.b
+ rm -f $TEST_DIR/t.$IMGFMT.ct.qcow2
+ rm -f $TEST_DIR/t.$IMGFMT.ct.raw
+ fi
;;
rbd)
--
1.7.1
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 0/6] add-cow file format
2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
` (5 preceding siblings ...)
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
@ 2012-08-23 5:34 ` Dong Xu Wang
6 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-08-23 5:34 UTC (permalink / raw)
To: qemu-devel; +Cc: kwolf, Dong Xu Wang
Anyone can give me some comments? That will be very grateful..
On Fri, Aug 10, 2012 at 11:39 PM, Dong Xu Wang
<wdongxu@linux.vnet.ibm.com> wrote:
> This will introduce a new file format: add-cow.
>
> add-cow can benefit from other available functions, such as path_has_protocol and
> qed_read_string, so we will make them public.
>
> Now add-cow is still using QEMUOptionParameter, not QemuOpts, I will send a
> separate patch series to convert.
>
> snapshot_blkdev are not supported now for add-cow, after converting QEMUOptionParameter
> to QemuOpts, will add related code.
>
>
> v11->v12:
> 1) Removed un-used feature bit.
> 2) Share cache code with qcow2.c.
> 3) Remove snapshot_blkdev support, will add it in another patch.
> 5) COW Bitmap field in add-cow file will be multiple of 65536.
> 6) fix grammer and typo.
>
> Dong Xu Wang (6):
> docs: document for add cow file format
> make path_has_protocol non-static
> qed_read_string to bdrv_read_string
> rename qcow2-cache.c to block-cache.c
> add-cow file format
> qemu-iotests
>
> block.c | 29 ++-
> block.h | 6 +
> block/Makefile.objs | 4 +-
> block/add-cow.c | 613 ++++++++++++++++++++++++++++++++++++++++++
> block/add-cow.h | 85 ++++++
> block/qcow2-cache.c | 323 ----------------------
> block/qcow2-cluster.c | 66 +++--
> block/qcow2-refcount.c | 66 +++--
> block/qcow2.c | 36 ++--
> block/qcow2.h | 24 +--
> block/qed.c | 29 +--
> block_int.h | 2 +
> docs/specs/add-cow.txt | 123 +++++++++
> tests/qemu-iotests/017 | 2 +-
> tests/qemu-iotests/020 | 2 +-
> tests/qemu-iotests/check | 4 +-
> tests/qemu-iotests/common | 6 +
> tests/qemu-iotests/common.rc | 19 ++
> trace-events | 13 +-
> 19 files changed, 994 insertions(+), 458 deletions(-)
> create mode 100644 block/add-cow.c
> create mode 100644 block/add-cow.h
> delete mode 100644 block/qcow2-cache.c
> create mode 100644 docs/specs/add-cow.txt
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
@ 2012-09-06 17:27 ` Michael Roth
2012-09-10 1:48 ` Dong Xu Wang
2012-09-10 15:23 ` Kevin Wolf
1 sibling, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:27 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: kwolf, qemu-devel
On Fri, Aug 10, 2012 at 11:39:40PM +0800, Dong Xu Wang wrote:
> Document for add-cow format, the usage and spec of add-cow are introduced.
>
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> docs/specs/add-cow.txt | 123 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 123 insertions(+), 0 deletions(-)
> create mode 100644 docs/specs/add-cow.txt
>
> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
> new file mode 100644
> index 0000000..d5a7a68
> --- /dev/null
> +++ b/docs/specs/add-cow.txt
> @@ -0,0 +1,123 @@
> +== General ==
> +
> +The raw file format does not support backing files or copy on write feature.
> +The add-cow image format makes it possible to use backing files with raw
> +image by keeping a separate .add-cow metadata file. Once all sectors
> +have been written into the raw image it is safe to discard the .add-cow
> +and backing files, then we can use the raw image directly.
> +
> +An example usage of add-cow would look like::
> +(ubuntu.img is a disk image which has been installed OS.)
> + 1) Create a raw image with the same size of ubuntu.img
> + qemu-img create -f raw test.raw 8G
> + 2) Create an add-cow image which will store dirty bitmap
> + qemu-img create -f add-cow test.add-cow \
> + -o backing_file=ubuntu.img,image_file=test.raw
> + 3) Run qemu with add-cow image
> + qemu -drive if=virtio,file=test.add-cow
> +
> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
> +will be calculated from the size of test.raw.
> +
> +=Specification=
> +
> +The file format looks like this:
> +
> + +---------------+-------------+-----------------+
> + | Header | Reserved | COW bitmap |
> + +---------------+-------------+-----------------+
> +
> +All numbers in add-cow are stored in Little Endian byte order.
> +
> +== Header ==
> +
> +The Header is included in the first bytes:
> +(#define HEADER_SIZE (4096 * header_pages_size))
> + Byte 0 - 7: magic
> + add-cow magic string ("ADD_COW\xff").
> +
> + 8 - 11: version
> + Version number (only valid value is 1 now).
> +
> + 12 - 15: backing file name offset
> + Offset in the add-cow file at which the backing file
> + name is stored (NB: The string is not nul-terminated).
> + If backing file name does NOT exist, this field will be
> + 0. Must be between 80 and [HEADER_SIZE - 2](a file name
> + must be at least 1 byte).
> +
> + 16 - 19: backing file name size
> + Length of the backing file name in bytes. It will be 0
> + if the backing file name offset is 0. If backing file
> + name offset is non-zero, then it must be non-zero. Must
> + be less than [HEADER_SIZE - 80] to fit in the reserved
> + part of the header.
> +
> + 20 - 23: image file name offset
> + Offset in the add-cow file at which the image file name
> + is stored (NB: The string is not null terminated). It
> + must be between 80 and [HEADER_SIZE - 2].
> +
> + 24 - 27: image file name size
> + Length of the image file name in bytes.
> + Must be less than [HEADER_SIZE - 80] to fit in the reserved
> + part of the header.
> +
> + 28 - 35: features
> + Currently only 1 feature bit is used:
> + Feature bits:
> + * ADD_COW_F_All_ALLOCATED = 0x01.
> +
> + 36 - 43: optional features
> + Not used now. Reserved for future use. It must be set to 0.
> +
> + 44 - 47: header pages size
> + The header field is variable-sized. This field indicates
> + how many pages(4k) will be used to store add-cow header.
> + In add-cow v1, it is fixed to 1, so the header size will
> + be 4k * 1 = 4096 bytes.
> +
> + 48 - 63: backing file format
> + format of backing file. It will be filled with 0 if
> + backing file name offset is 0. If backing file name
> + offset is non-zero, it must be non-zero. It is coded
> + in free-form ASCII, and is not NUL-terminated.
> +
> + 64 - 79: image file format
> + format of image file. It must be non-zero. It is coded
> + in free-form ASCII, and is not NUL-terminated.
> +
> + 80 - [HEADER_SIZE - 1]:
> + It is used to make sure COW bitmap field starts at the
> + HEADER_SIZE byte, backing file name and image file name
> + will be stored here. The bytes that is not pointing to
> + backing file and image file names will bet set to 0.
> +
> +== COW bitmap ==
> +
> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
> +backing file and image file. The bitmap will track whether the sector in
> +backing file is dirty or not.
> +
> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
> +sectors, then each bit indicates 512 * 128 = 64k bytes. the size of bitmap is
> +calculated according to virtual size of image file, and it also should be multipe
> +of 65536, the bits not used will be set to 0. Within each byte, the least
> +significant bit covers the first cluster. Bit orders in one byte look like:
> + +----+----+----+----+----+----+----+----+
> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
> + +----+----+----+----+----+----+----+----+
> +
> +If the bit is 0, indicates the sector has not been allocated in image file, data
> +should be loaded from backing file while reading; if the bit is 1, indicates the
> +related sector has been dirty, should be loaded from image file while reading.
> +Writing to a sector causes the corresponding bit to be set to 1.
> +
> +If raw image is not an even multiple of cluster bytes, bits that correspond to
> +bytes beyond the raw file size in add-cow will be 0.
> +
> +Image file name and backing file name must NOT be the same, we prevent this
> +while creating add-cow files.
> +
> +Image file and backing file are interpreted relative to the qcow2 file, not
Relative to the add-cow file?
> +to the current working directory of the process that opened the qcow2 file.
> --
> 1.7.1
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
@ 2012-09-06 17:27 ` Michael Roth
0 siblings, 0 replies; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:27 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: kwolf, qemu-devel
On Fri, Aug 10, 2012 at 11:39:41PM +0800, Dong Xu Wang wrote:
> We will use path_has_protocol outside block.c, so just make it public.
>
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
Reviewed-by: Michael Roth <mdroth@linux.vnet.ibm.com>
> ---
> block.c | 2 +-
> block.h | 1 +
> 2 files changed, 2 insertions(+), 1 deletions(-)
>
> diff --git a/block.c b/block.c
> index 24323c1..c13d803 100644
> --- a/block.c
> +++ b/block.c
> @@ -196,7 +196,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
> }
>
> /* check if the path starts with "<protocol>:" */
> -static int path_has_protocol(const char *path)
> +int path_has_protocol(const char *path)
> {
> const char *p;
>
> diff --git a/block.h b/block.h
> index 650d872..54e61c9 100644
> --- a/block.h
> +++ b/block.h
> @@ -307,6 +307,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn);
>
> char *get_human_readable_size(char *buf, int buf_size, int64_t size);
> int path_is_absolute(const char *path);
> +int path_has_protocol(const char *path);
> void path_combine(char *dest, int dest_size,
> const char *base_path,
> const char *filename);
> --
> 1.7.1
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
@ 2012-09-06 17:32 ` Michael Roth
2012-09-10 1:49 ` Dong Xu Wang
0 siblings, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:32 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: kwolf, qemu-devel
On Fri, Aug 10, 2012 at 11:39:42PM +0800, Dong Xu Wang wrote:
> Make qed_read_string function to a common interface, so move it to block.c.
>
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> block.c | 27 +++++++++++++++++++++++++++
> block.h | 2 ++
> block/qed.c | 29 +----------------------------
> 3 files changed, 30 insertions(+), 28 deletions(-)
>
> diff --git a/block.c b/block.c
> index c13d803..d906b35 100644
> --- a/block.c
> +++ b/block.c
> @@ -213,6 +213,33 @@ int path_has_protocol(const char *path)
> return *p == ':';
> }
>
> +/**
> + * Read a string of known length from the image file
> + *
> + * @bs: Image file
> + * @offset: File offset to start of string, in bytes
> + * @n: String length in bytes
> + * @buf: Destination buffer
> + * @buflen: Destination buffer length in bytes
> + * @ret: 0 on success, -errno on failure
> + *
> + * The string is NUL-terminated.
> + */
> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
> + char *buf, size_t buflen)
Small alignment issue ^
> +{
> + int ret;
> + if (n >= buflen) {
> + return -EINVAL;
> + }
> + ret = bdrv_pread(bs, offset, buf, n);
> + if (ret < 0) {
> + return ret;
> + }
> + buf[n] = '\0';
> + return 0;
> +}
> +
> int path_is_absolute(const char *path)
> {
> #ifdef _WIN32
> diff --git a/block.h b/block.h
> index 54e61c9..e5dfcd7 100644
> --- a/block.h
> +++ b/block.h
> @@ -154,6 +154,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
> const void *buf, int count);
> int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
> int nb_sectors, QEMUIOVector *qiov);
> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
> + char *buf, size_t buflen);
Another one here ^
> int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
> int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
> int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
> diff --git a/block/qed.c b/block/qed.c
> index 5f3eefa..311c589 100644
> --- a/block/qed.c
> +++ b/block/qed.c
> @@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
> }
>
> /**
> - * Read a string of known length from the image file
> - *
> - * @file: Image file
> - * @offset: File offset to start of string, in bytes
> - * @n: String length in bytes
> - * @buf: Destination buffer
> - * @buflen: Destination buffer length in bytes
> - * @ret: 0 on success, -errno on failure
> - *
> - * The string is NUL-terminated.
> - */
> -static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
> - char *buf, size_t buflen)
> -{
> - int ret;
> - if (n >= buflen) {
> - return -EINVAL;
> - }
> - ret = bdrv_pread(file, offset, buf, n);
> - if (ret < 0) {
> - return ret;
> - }
> - buf[n] = '\0';
> - return 0;
> -}
> -
> -/**
> * Allocate new clusters
> *
> * @s: QED state
> @@ -437,7 +410,7 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
> return -EINVAL;
> }
>
> - ret = qed_read_string(bs->file, s->header.backing_filename_offset,
> + ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
> s->header.backing_filename_size, bs->backing_file,
> sizeof(bs->backing_file));
Here too ^
Looks good otherwise.
> if (ret < 0) {
> --
> 1.7.1
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
@ 2012-09-06 17:52 ` Michael Roth
2012-09-10 2:14 ` Dong Xu Wang
2012-09-11 8:41 ` Kevin Wolf
1 sibling, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 17:52 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: kwolf, qemu-devel
On Fri, Aug 10, 2012 at 11:39:43PM +0800, Dong Xu Wang wrote:
> add-cow and qcow2 file format will share the same cache code, so rename
> block-cache.c to block-cache.c. And related structure and qcow2 code also
"qcow2-cache.c to block-cache.c"
But I've scanned through the rest of your patches and can't seem to find
where block-cache.c gets introduced. Did you forget to git add it?
> are changed.
>
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> block.h | 3 +
> block/Makefile.objs | 3 +-
> block/qcow2-cache.c | 323 ------------------------------------------------
> block/qcow2-cluster.c | 66 ++++++----
> block/qcow2-refcount.c | 66 ++++++-----
> block/qcow2.c | 36 +++---
> block/qcow2.h | 24 +---
> trace-events | 13 +-
> 8 files changed, 109 insertions(+), 425 deletions(-)
> delete mode 100644 block/qcow2-cache.c
>
> diff --git a/block.h b/block.h
> index e5dfcd7..c325661 100644
> --- a/block.h
> +++ b/block.h
> @@ -401,6 +401,9 @@ typedef enum {
> BLKDBG_CLUSTER_ALLOC_BYTES,
> BLKDBG_CLUSTER_FREE,
>
> + BLKDBG_ADD_COW_UPDATE,
> + BLKDBG_ADD_COW_LOAD,
> +
> BLKDBG_EVENT_MAX,
> } BlkDebugEvent;
>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index b5754d3..23bdfc8 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -1,7 +1,8 @@
> block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
> block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> block-obj-y += qed-check.o
> +block-obj-y += block-cache.o
> block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
> block-obj-y += stream.o
> block-obj-$(CONFIG_WIN32) += raw-win32.o
> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
> deleted file mode 100644
> index 2d4322a..0000000
> --- a/block/qcow2-cache.c
> +++ /dev/null
> @@ -1,323 +0,0 @@
> -/*
> - * L2/refcount table cache for the QCOW2 format
> - *
> - * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
> - *
> - * Permission is hereby granted, free of charge, to any person obtaining a copy
> - * of this software and associated documentation files (the "Software"), to deal
> - * in the Software without restriction, including without limitation the rights
> - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> - * copies of the Software, and to permit persons to whom the Software is
> - * furnished to do so, subject to the following conditions:
> - *
> - * The above copyright notice and this permission notice shall be included in
> - * all copies or substantial portions of the Software.
> - *
> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
> - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> - * THE SOFTWARE.
> - */
> -
> -#include "block_int.h"
> -#include "qemu-common.h"
> -#include "qcow2.h"
> -#include "trace.h"
> -
> -typedef struct Qcow2CachedTable {
> - void* table;
> - int64_t offset;
> - bool dirty;
> - int cache_hits;
> - int ref;
> -} Qcow2CachedTable;
> -
> -struct Qcow2Cache {
> - Qcow2CachedTable* entries;
> - struct Qcow2Cache* depends;
> - int size;
> - bool depends_on_flush;
> -};
> -
> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
> -{
> - BDRVQcowState *s = bs->opaque;
> - Qcow2Cache *c;
> - int i;
> -
> - c = g_malloc0(sizeof(*c));
> - c->size = num_tables;
> - c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
> -
> - for (i = 0; i < c->size; i++) {
> - c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
> - }
> -
> - return c;
> -}
> -
> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
> -{
> - int i;
> -
> - for (i = 0; i < c->size; i++) {
> - assert(c->entries[i].ref == 0);
> - qemu_vfree(c->entries[i].table);
> - }
> -
> - g_free(c->entries);
> - g_free(c);
> -
> - return 0;
> -}
> -
> -static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
> -{
> - int ret;
> -
> - ret = qcow2_cache_flush(bs, c->depends);
> - if (ret < 0) {
> - return ret;
> - }
> -
> - c->depends = NULL;
> - c->depends_on_flush = false;
> -
> - return 0;
> -}
> -
> -static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
> -{
> - BDRVQcowState *s = bs->opaque;
> - int ret = 0;
> -
> - if (!c->entries[i].dirty || !c->entries[i].offset) {
> - return 0;
> - }
> -
> - trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
> - c == s->l2_table_cache, i);
> -
> - if (c->depends) {
> - ret = qcow2_cache_flush_dependency(bs, c);
> - } else if (c->depends_on_flush) {
> - ret = bdrv_flush(bs->file);
> - if (ret >= 0) {
> - c->depends_on_flush = false;
> - }
> - }
> -
> - if (ret < 0) {
> - return ret;
> - }
> -
> - if (c == s->refcount_block_cache) {
> - BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
> - } else if (c == s->l2_table_cache) {
> - BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
> - }
> -
> - ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
> - s->cluster_size);
> - if (ret < 0) {
> - return ret;
> - }
> -
> - c->entries[i].dirty = false;
> -
> - return 0;
> -}
> -
> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
> -{
> - BDRVQcowState *s = bs->opaque;
> - int result = 0;
> - int ret;
> - int i;
> -
> - trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
> -
> - for (i = 0; i < c->size; i++) {
> - ret = qcow2_cache_entry_flush(bs, c, i);
> - if (ret < 0 && result != -ENOSPC) {
> - result = ret;
> - }
> - }
> -
> - if (result == 0) {
> - ret = bdrv_flush(bs->file);
> - if (ret < 0) {
> - result = ret;
> - }
> - }
> -
> - return result;
> -}
> -
> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
> - Qcow2Cache *dependency)
> -{
> - int ret;
> -
> - if (dependency->depends) {
> - ret = qcow2_cache_flush_dependency(bs, dependency);
> - if (ret < 0) {
> - return ret;
> - }
> - }
> -
> - if (c->depends && (c->depends != dependency)) {
> - ret = qcow2_cache_flush_dependency(bs, c);
> - if (ret < 0) {
> - return ret;
> - }
> - }
> -
> - c->depends = dependency;
> - return 0;
> -}
> -
> -void qcow2_cache_depends_on_flush(Qcow2Cache *c)
> -{
> - c->depends_on_flush = true;
> -}
> -
> -static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
> -{
> - int i;
> - int min_count = INT_MAX;
> - int min_index = -1;
> -
> -
> - for (i = 0; i < c->size; i++) {
> - if (c->entries[i].ref) {
> - continue;
> - }
> -
> - if (c->entries[i].cache_hits < min_count) {
> - min_index = i;
> - min_count = c->entries[i].cache_hits;
> - }
> -
> - /* Give newer hits priority */
> - /* TODO Check how to optimize the replacement strategy */
> - c->entries[i].cache_hits /= 2;
> - }
> -
> - if (min_index == -1) {
> - /* This can't happen in current synchronous code, but leave the check
> - * here as a reminder for whoever starts using AIO with the cache */
> - abort();
> - }
> - return min_index;
> -}
> -
> -static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
> - uint64_t offset, void **table, bool read_from_disk)
> -{
> - BDRVQcowState *s = bs->opaque;
> - int i;
> - int ret;
> -
> - trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
> - offset, read_from_disk);
> -
> - /* Check if the table is already cached */
> - for (i = 0; i < c->size; i++) {
> - if (c->entries[i].offset == offset) {
> - goto found;
> - }
> - }
> -
> - /* If not, write a table back and replace it */
> - i = qcow2_cache_find_entry_to_replace(c);
> - trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
> - c == s->l2_table_cache, i);
> - if (i < 0) {
> - return i;
> - }
> -
> - ret = qcow2_cache_entry_flush(bs, c, i);
> - if (ret < 0) {
> - return ret;
> - }
> -
> - trace_qcow2_cache_get_read(qemu_coroutine_self(),
> - c == s->l2_table_cache, i);
> - c->entries[i].offset = 0;
> - if (read_from_disk) {
> - if (c == s->l2_table_cache) {
> - BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
> - }
> -
> - ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
> - if (ret < 0) {
> - return ret;
> - }
> - }
> -
> - /* Give the table some hits for the start so that it won't be replaced
> - * immediately. The number 32 is completely arbitrary. */
> - c->entries[i].cache_hits = 32;
> - c->entries[i].offset = offset;
> -
> - /* And return the right table */
> -found:
> - c->entries[i].cache_hits++;
> - c->entries[i].ref++;
> - *table = c->entries[i].table;
> -
> - trace_qcow2_cache_get_done(qemu_coroutine_self(),
> - c == s->l2_table_cache, i);
> -
> - return 0;
> -}
> -
> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> - void **table)
> -{
> - return qcow2_cache_do_get(bs, c, offset, table, true);
> -}
> -
> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> - void **table)
> -{
> - return qcow2_cache_do_get(bs, c, offset, table, false);
> -}
> -
> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
> -{
> - int i;
> -
> - for (i = 0; i < c->size; i++) {
> - if (c->entries[i].table == *table) {
> - goto found;
> - }
> - }
> - return -ENOENT;
> -
> -found:
> - c->entries[i].ref--;
> - *table = NULL;
> -
> - assert(c->entries[i].ref >= 0);
> - return 0;
> -}
> -
> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
> -{
> - int i;
> -
> - for (i = 0; i < c->size; i++) {
> - if (c->entries[i].table == table) {
> - goto found;
> - }
> - }
> - abort();
> -
> -found:
> - c->entries[i].dirty = true;
> -}
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index e179211..335dc7a 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -28,6 +28,7 @@
> #include "block_int.h"
> #include "block/qcow2.h"
> #include "trace.h"
> +#include "block-cache.h"
>
> int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
> {
> @@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
> return new_l1_table_offset;
> }
>
> - ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> + ret = block_cache_flush(bs, s->refcount_block_cache,
> + BLOCK_TABLE_REF, s->cluster_size);
> if (ret < 0) {
> goto fail;
> }
> @@ -119,7 +121,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
> BDRVQcowState *s = bs->opaque;
> int ret;
>
> - ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
> + ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
> + (void **) l2_table, BLOCK_TABLE_L2, s->cluster_size);
>
> return ret;
> }
> @@ -180,7 +183,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
> return l2_offset;
> }
>
> - ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> + ret = block_cache_flush(bs, s->refcount_block_cache,
> + BLOCK_TABLE_REF, s->cluster_size);
> if (ret < 0) {
> goto fail;
> }
> @@ -188,7 +192,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
> /* allocate a new entry in the l2 cache */
>
> trace_qcow2_l2_allocate_get_empty(bs, l1_index);
> - ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
> + ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
> + (void **) table, BLOCK_TABLE_L2, s->cluster_size);
> if (ret < 0) {
> return ret;
> }
> @@ -203,16 +208,17 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>
> /* if there was an old l2 table, read it from the disk */
> BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
> - ret = qcow2_cache_get(bs, s->l2_table_cache,
> + ret = block_cache_get(bs, s->l2_table_cache,
> old_l2_offset & L1E_OFFSET_MASK,
> - (void**) &old_table);
> + (void **) &old_table, BLOCK_TABLE_L2, s->cluster_size);
> if (ret < 0) {
> goto fail;
> }
>
> memcpy(l2_table, old_table, s->cluster_size);
>
> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
> + ret = block_cache_put(bs, s->l2_table_cache,
> + (void **) &old_table, BLOCK_TABLE_L2);
> if (ret < 0) {
> goto fail;
> }
> @@ -222,8 +228,9 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
> BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
>
> trace_qcow2_l2_allocate_write_l2(bs, l1_index);
> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> - ret = qcow2_cache_flush(bs, s->l2_table_cache);
> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> + ret = block_cache_flush(bs, s->l2_table_cache,
> + BLOCK_TABLE_L2, s->cluster_size);
> if (ret < 0) {
> goto fail;
> }
> @@ -242,7 +249,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>
> fail:
> trace_qcow2_l2_allocate_done(bs, l1_index, ret);
> - qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
> + block_cache_put(bs, s->l2_table_cache, (void **) table, BLOCK_TABLE_L2);
> s->l1_table[l1_index] = old_l2_offset;
> return ret;
> }
> @@ -475,7 +482,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
> abort();
> }
>
> - qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> + block_cache_put(bs, s->l2_table_cache, (void **) &l2_table, BLOCK_TABLE_L2);
>
> nb_available = (c * s->cluster_sectors);
>
> @@ -584,13 +591,15 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
> * allocated. */
> cluster_offset = be64_to_cpu(l2_table[l2_index]);
> if (cluster_offset & L2E_OFFSET_MASK) {
> - qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> + block_cache_put(bs, s->l2_table_cache,
> + (void **) &l2_table, BLOCK_TABLE_L2);
> return 0;
> }
>
> cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
> if (cluster_offset < 0) {
> - qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> + block_cache_put(bs, s->l2_table_cache,
> + (void **) &l2_table, BLOCK_TABLE_L2);
> return 0;
> }
>
> @@ -605,9 +614,10 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
> /* compressed clusters never have the copied flag */
>
> BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> l2_table[l2_index] = cpu_to_be64(cluster_offset);
> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> + ret = block_cache_put(bs, s->l2_table_cache,
> + (void **) &l2_table, BLOCK_TABLE_L2);
> if (ret < 0) {
> return 0;
> }
> @@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
> * handled.
> */
> if (cow) {
> - qcow2_cache_depends_on_flush(s->l2_table_cache);
> + block_cache_depends_on_flush(s->l2_table_cache);
> }
>
> - if (qcow2_need_accurate_refcounts(s)) {
> - qcow2_cache_set_dependency(bs, s->l2_table_cache,
> - s->refcount_block_cache);
> - }
> + block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
> + s->refcount_block_cache, s->cluster_size);
> ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
> if (ret < 0) {
> goto err;
> }
> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>
> for (i = 0; i < m->nb_clusters; i++) {
> /* if two concurrent writes happen to the same unallocated cluster
> @@ -687,7 +695,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
> }
>
>
> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> + ret = block_cache_put(bs, s->l2_table_cache,
> + (void **) &l2_table, BLOCK_TABLE_L2);
> if (ret < 0) {
> goto err;
> }
> @@ -913,7 +922,8 @@ again:
> * request to complete. If we still had the reference, we could use up the
> * whole cache with sleeping requests.
> */
> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> + ret = block_cache_put(bs, s->l2_table_cache,
> + (void **) &l2_table, BLOCK_TABLE_L2);
> if (ret < 0) {
> return ret;
> }
> @@ -1077,14 +1087,15 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
> }
>
> /* First remove L2 entries */
> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> l2_table[l2_index + i] = cpu_to_be64(0);
>
> /* Then decrease the refcount */
> qcow2_free_any_clusters(bs, old_offset, 1);
> }
>
> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> + ret = block_cache_put(bs, s->l2_table_cache,
> + (void **) &l2_table, BLOCK_TABLE_L2);
> if (ret < 0) {
> return ret;
> }
> @@ -1154,7 +1165,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
> old_offset = be64_to_cpu(l2_table[l2_index + i]);
>
> /* Update L2 entries */
> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> if (old_offset & QCOW_OFLAG_COMPRESSED) {
> l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
> qcow2_free_any_clusters(bs, old_offset, 1);
> @@ -1163,7 +1174,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
> }
> }
>
> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> + ret = block_cache_put(bs, s->l2_table_cache,
> + (void **) &l2_table, BLOCK_TABLE_L2);
> if (ret < 0) {
> return ret;
> }
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 5e3f915..728bfc1 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -25,6 +25,7 @@
> #include "qemu-common.h"
> #include "block_int.h"
> #include "block/qcow2.h"
> +#include "block-cache.h"
>
> static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
> static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
> @@ -71,8 +72,8 @@ static int load_refcount_block(BlockDriverState *bs,
> int ret;
>
> BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
> - ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> - refcount_block);
> + ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> + refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>
> return ret;
> }
> @@ -98,8 +99,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
> if (!refcount_block_offset)
> return 0;
>
> - ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> - (void**) &refcount_block);
> + ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
> + (void **) &refcount_block, BLOCK_TABLE_REF, s->cluster_size);
> if (ret < 0) {
> return ret;
> }
> @@ -108,8 +109,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
> ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
> refcount = be16_to_cpu(refcount_block[block_index]);
>
> - ret = qcow2_cache_put(bs, s->refcount_block_cache,
> - (void**) &refcount_block);
> + ret = block_cache_put(bs, s->refcount_block_cache,
> + (void **) &refcount_block, BLOCK_TABLE_REF);
> if (ret < 0) {
> return ret;
> }
> @@ -201,7 +202,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
> *refcount_block = NULL;
>
> /* We write to the refcount table, so we might depend on L2 tables */
> - qcow2_cache_flush(bs, s->l2_table_cache);
> + block_cache_flush(bs, s->l2_table_cache,
> + BLOCK_TABLE_L2, s->cluster_size);
>
> /* Allocate the refcount block itself and mark it as used */
> int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
> @@ -217,8 +219,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>
> if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
> /* Zero the new refcount block before updating it */
> - ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
> - (void**) refcount_block);
> + ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
> + (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
> if (ret < 0) {
> goto fail_block;
> }
> @@ -241,8 +243,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>
> /* Initialize the new refcount block only after updating its refcount,
> * update_refcount uses the refcount cache itself */
> - ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
> - (void**) refcount_block);
> + ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
> + (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
> if (ret < 0) {
> goto fail_block;
> }
> @@ -252,8 +254,9 @@ static int alloc_refcount_block(BlockDriverState *bs,
>
> /* Now the new refcount block needs to be written to disk */
> BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
> - qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
> - ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> + block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
> + ret = block_cache_flush(bs, s->refcount_block_cache,
> + BLOCK_TABLE_REF, s->cluster_size);
> if (ret < 0) {
> goto fail_block;
> }
> @@ -273,7 +276,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
> return 0;
> }
>
> - ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
> + ret = block_cache_put(bs, s->refcount_block_cache,
> + (void **) refcount_block, BLOCK_TABLE_REF);
> if (ret < 0) {
> goto fail_block;
> }
> @@ -406,7 +410,8 @@ fail_table:
> g_free(new_table);
> fail_block:
> if (*refcount_block != NULL) {
> - qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
> + block_cache_put(bs, s->refcount_block_cache,
> + (void **) refcount_block, BLOCK_TABLE_REF);
> }
> return ret;
> }
> @@ -432,8 +437,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
> }
>
> if (addend < 0) {
> - qcow2_cache_set_dependency(bs, s->refcount_block_cache,
> - s->l2_table_cache);
> + block_cache_set_dependency(bs, s->refcount_block_cache, BLOCK_TABLE_REF,
> + s->l2_table_cache, s->cluster_size);
> }
>
> start = offset & ~(s->cluster_size - 1);
> @@ -449,8 +454,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
> /* Load the refcount block and allocate it if needed */
> if (table_index != old_table_index) {
> if (refcount_block) {
> - ret = qcow2_cache_put(bs, s->refcount_block_cache,
> - (void**) &refcount_block);
> + ret = block_cache_put(bs, s->refcount_block_cache,
> + (void **) &refcount_block, BLOCK_TABLE_REF);
> if (ret < 0) {
> goto fail;
> }
> @@ -463,7 +468,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
> }
> old_table_index = table_index;
>
> - qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
> + block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
>
> /* we can update the count and save it */
> block_index = cluster_index &
> @@ -486,8 +491,8 @@ fail:
> /* Write last changed block to disk */
> if (refcount_block) {
> int wret;
> - wret = qcow2_cache_put(bs, s->refcount_block_cache,
> - (void**) &refcount_block);
> + wret = block_cache_put(bs, s->refcount_block_cache,
> + (void **) &refcount_block, BLOCK_TABLE_REF);
> if (wret < 0) {
> return ret < 0 ? ret : wret;
> }
> @@ -763,8 +768,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
> old_l2_offset = l2_offset;
> l2_offset &= L1E_OFFSET_MASK;
>
> - ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
> - (void**) &l2_table);
> + ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
> + (void **) &l2_table, BLOCK_TABLE_L2, s->cluster_size);
> if (ret < 0) {
> goto fail;
> }
> @@ -811,16 +816,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
> }
> if (offset != old_offset) {
> if (addend > 0) {
> - qcow2_cache_set_dependency(bs, s->l2_table_cache,
> - s->refcount_block_cache);
> + block_cache_set_dependency(bs, s->l2_table_cache,
> + BLOCK_TABLE_L2, s->refcount_block_cache,
> + s->cluster_size);
> }
> l2_table[j] = cpu_to_be64(offset);
> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
> }
> }
> }
>
> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> + ret = block_cache_put(bs, s->l2_table_cache,
> + (void **) &l2_table, BLOCK_TABLE_L2);
> if (ret < 0) {
> goto fail;
> }
> @@ -847,7 +854,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
> ret = 0;
> fail:
> if (l2_table) {
> - qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
> + block_cache_put(bs, s->l2_table_cache,
> + (void **) &l2_table, BLOCK_TABLE_L2);
> }
>
> /* Update L1 only if it isn't deleted anyway (addend = -1) */
> diff --git a/block/qcow2.c b/block/qcow2.c
> index fd5e214..b89d312 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -30,6 +30,7 @@
> #include "qemu-error.h"
> #include "qerror.h"
> #include "trace.h"
> +#include "block-cache.h"
>
> /*
> Differences with QCOW:
> @@ -415,8 +416,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
> }
>
> /* alloc L2 table/refcount block cache */
> - s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
> - s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
> + s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
> + s->refcount_block_cache =
> + block_cache_create(bs, REFCOUNT_CACHE_SIZE, s->cluster_size);
>
> s->cluster_cache = g_malloc(s->cluster_size);
> /* one more sector for decompressed data alignment */
> @@ -500,7 +502,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
> qcow2_refcount_close(bs);
> g_free(s->l1_table);
> if (s->l2_table_cache) {
> - qcow2_cache_destroy(bs, s->l2_table_cache);
> + block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
> }
> g_free(s->cluster_cache);
> qemu_vfree(s->cluster_data);
> @@ -860,13 +862,13 @@ static void qcow2_close(BlockDriverState *bs)
> BDRVQcowState *s = bs->opaque;
> g_free(s->l1_table);
>
> - qcow2_cache_flush(bs, s->l2_table_cache);
> - qcow2_cache_flush(bs, s->refcount_block_cache);
> -
> + block_cache_flush(bs, s->l2_table_cache,
> + BLOCK_TABLE_L2, s->cluster_size);
> + block_cache_flush(bs, s->refcount_block_cache,
> + BLOCK_TABLE_REF, s->cluster_size);
> qcow2_mark_clean(bs);
> -
> - qcow2_cache_destroy(bs, s->l2_table_cache);
> - qcow2_cache_destroy(bs, s->refcount_block_cache);
> + block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
> + block_cache_destroy(bs, s->refcount_block_cache, BLOCK_TABLE_REF);
>
> g_free(s->unknown_header_fields);
> cleanup_unknown_header_ext(bs);
> @@ -1339,8 +1341,6 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
> options->value.s);
> return -EINVAL;
> }
> - } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
> - flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
> }
> options++;
> }
> @@ -1537,18 +1537,18 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
> int ret;
>
> qemu_co_mutex_lock(&s->lock);
> - ret = qcow2_cache_flush(bs, s->l2_table_cache);
> + ret = block_cache_flush(bs, s->l2_table_cache,
> + BLOCK_TABLE_L2, s->cluster_size);
> if (ret < 0) {
> qemu_co_mutex_unlock(&s->lock);
> return ret;
> }
>
> - if (qcow2_need_accurate_refcounts(s)) {
> - ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> - if (ret < 0) {
> - qemu_co_mutex_unlock(&s->lock);
> - return ret;
> - }
> + ret = block_cache_flush(bs, s->refcount_block_cache,
> + BLOCK_TABLE_REF, s->cluster_size);
> + if (ret < 0) {
> + qemu_co_mutex_unlock(&s->lock);
> + return ret;
> }
> qemu_co_mutex_unlock(&s->lock);
>
> diff --git a/block/qcow2.h b/block/qcow2.h
> index b4eb654..cb6fd7a 100644
> --- a/block/qcow2.h
> +++ b/block/qcow2.h
> @@ -27,6 +27,7 @@
>
> #include "aes.h"
> #include "qemu-coroutine.h"
> +#include "block-cache.h"
>
> //#define DEBUG_ALLOC
> //#define DEBUG_ALLOC2
> @@ -94,8 +95,6 @@ typedef struct QCowSnapshot {
> uint64_t vm_clock_nsec;
> } QCowSnapshot;
>
> -struct Qcow2Cache;
> -typedef struct Qcow2Cache Qcow2Cache;
>
> typedef struct Qcow2UnknownHeaderExtension {
> uint32_t magic;
> @@ -146,8 +145,8 @@ typedef struct BDRVQcowState {
> uint64_t l1_table_offset;
> uint64_t *l1_table;
>
> - Qcow2Cache* l2_table_cache;
> - Qcow2Cache* refcount_block_cache;
> + BlockCache *l2_table_cache;
> + BlockCache *refcount_block_cache;
>
> uint8_t *cluster_cache;
> uint8_t *cluster_data;
> @@ -316,21 +315,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
>
> void qcow2_free_snapshots(BlockDriverState *bs);
> int qcow2_read_snapshots(BlockDriverState *bs);
> -
> -/* qcow2-cache.c functions */
> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
> -
> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
> - Qcow2Cache *dependency);
> -void qcow2_cache_depends_on_flush(Qcow2Cache *c);
> -
> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> - void **table);
> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
> - void **table);
> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
> -
> #endif
> diff --git a/trace-events b/trace-events
> index 6b12f83..52b6438 100644
> --- a/trace-events
> +++ b/trace-events
> @@ -439,12 +439,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
> qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
> qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
>
> -qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
> -qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> -qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> -qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> -qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
> -qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +# block/block-cache.c
> +block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
> +block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
> +block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
> +block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>
> # block/qed-l2-cache.c
> qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
> --
> 1.7.1
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
@ 2012-09-06 20:19 ` Michael Roth
2012-09-10 2:25 ` Dong Xu Wang
2012-09-11 9:40 ` Kevin Wolf
1 sibling, 1 reply; 25+ messages in thread
From: Michael Roth @ 2012-09-06 20:19 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: kwolf, qemu-devel
On Fri, Aug 10, 2012 at 11:39:44PM +0800, Dong Xu Wang wrote:
> add-cow file format core code. It use block-cache.c as cache code.
>
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> block/Makefile.objs | 1 +
> block/add-cow.c | 613 +++++++++++++++++++++++++++++++++++++++++++++++++++
> block/add-cow.h | 85 +++++++
> block_int.h | 2 +
> 4 files changed, 701 insertions(+), 0 deletions(-)
> create mode 100644 block/add-cow.c
> create mode 100644 block/add-cow.h
>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 23bdfc8..7ed5051 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
> block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
> block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> block-obj-y += qed-check.o
> +block-obj-y += add-cow.o
> block-obj-y += block-cache.o
> block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
> block-obj-y += stream.o
> diff --git a/block/add-cow.c b/block/add-cow.c
> new file mode 100644
> index 0000000..d4711d5
> --- /dev/null
> +++ b/block/add-cow.c
> @@ -0,0 +1,613 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "block_int.h"
> +#include "module.h"
> +#include "add-cow.h"
> +
> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
> +{
> + cpu->magic = le64_to_cpu(le->magic);
> + cpu->version = le32_to_cpu(le->version);
> +
> + cpu->backing_filename_offset = le32_to_cpu(le->backing_filename_offset);
> + cpu->backing_filename_size = le32_to_cpu(le->backing_filename_size);
> +
> + cpu->image_filename_offset = le32_to_cpu(le->image_filename_offset);
> + cpu->image_filename_size = le32_to_cpu(le->image_filename_size);
> +
> + cpu->features = le64_to_cpu(le->features);
> + cpu->optional_features = le64_to_cpu(le->optional_features);
> + cpu->header_pages_size = le32_to_cpu(le->header_pages_size);
> +}
> +
> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
> +{
> + le->magic = cpu_to_le64(cpu->magic);
> + le->version = cpu_to_le32(cpu->version);
> +
> + le->backing_filename_offset = cpu_to_le32(cpu->backing_filename_offset);
> + le->backing_filename_size = cpu_to_le32(cpu->backing_filename_size);
> +
> + le->image_filename_offset = cpu_to_le32(cpu->image_filename_offset);
> + le->image_filename_size = cpu_to_le32(cpu->image_filename_size);
> +
> + le->features = cpu_to_le64(cpu->features);
> + le->optional_features = cpu_to_le64(cpu->optional_features);
> + le->header_pages_size = cpu_to_le32(cpu->header_pages_size);
> +}
> +
> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> + const AddCowHeader *header = (const AddCowHeader *)buf;
> +
> + if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
> + le32_to_cpu(header->version) == ADD_COW_VERSION) {
> + return 100;
> + } else {
> + return 0;
> + }
> +}
> +
> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
> +{
> + AddCowHeader header = {
> + .magic = ADD_COW_MAGIC,
> + .version = ADD_COW_VERSION,
> + .features = 0,
> + .optional_features = 0,
> + .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
> + };
> + AddCowHeader le_header;
> + int64_t image_len = 0;
> + const char *backing_filename = NULL;
> + const char *backing_fmt = NULL;
> + const char *image_filename = NULL;
> + const char *image_format = NULL;
> + BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
> + BlockDriver *drv = bdrv_find_format("add-cow");
> + BDRVAddCowState s;
> + int ret;
> +
> + while (options && options->name) {
> + if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
> + image_len = options->value.n;
> + } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
> + backing_filename = options->value.s;
> + } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
> + backing_fmt = options->value.s;
> + } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
> + image_filename = options->value.s;
> + } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
> + image_format = options->value.s;
> + }
> + options++;
> + }
> +
> + if (backing_filename) {
> + header.backing_filename_offset = sizeof(header)
> + + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
> + header.backing_filename_size = strlen(backing_filename);
> +
> + if (!backing_fmt) {
> + backing_bs = bdrv_new("image");
> + ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
> + | BDRV_O_CACHE_WB, NULL);
> + if (ret < 0) {
> + return ret;
> + }
> + backing_fmt = bdrv_get_format_name(backing_bs);
> + bdrv_delete(backing_bs);
> + }
> + } else {
> + header.features |= ADD_COW_F_All_ALLOCATED;
> + }
> +
> + if (image_filename) {
> + header.image_filename_offset =
> + sizeof(header) + sizeof(s.backing_file_format)
> + + sizeof(s.image_file_format) + header.backing_filename_size;
> + header.image_filename_size = strlen(image_filename);
> + } else {
> + error_report("Error: image_file should be given.");
> + return -EINVAL;
> + }
> +
> + if (backing_filename && !strcmp(backing_filename, image_filename)) {
> + error_report("Error: Trying to create an image with the "
> + "same backing file name as the image file name");
> + return -EINVAL;
> + }
> +
> + if (!strcmp(filename, image_filename)) {
> + error_report("Error: Trying to create an image with the "
> + "same filename as the image file name");
> + return -EINVAL;
> + }
> +
> + if (header.image_filename_offset + header.image_filename_size
> + > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
> + error_report("image_file name or backing_file name too long.");
> + return -ENOSPC;
> + }
> +
> + ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
> + if (ret < 0) {
> + return ret;
> + }
> + bdrv_delete(image_bs);
> +
> + ret = bdrv_create_file(filename, NULL);
> + if (ret < 0) {
> + return ret;
> + }
> +
> + ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
> + if (ret < 0) {
> + return ret;
> + }
> + add_cow_header_cpu_to_le(&header, &le_header);
> + ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> +
> + ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
> + backing_fmt ? strlen(backing_fmt) : 0);
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> +
> + ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
> + image_format ? image_format : "raw",
> + image_format ? strlen(image_format) : sizeof("raw"));
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> +
> + if (backing_filename) {
> + ret = bdrv_pwrite(bs, header.backing_filename_offset,
> + backing_filename, header.backing_filename_size);
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> + }
> +
> + ret = bdrv_pwrite(bs, header.image_filename_offset,
> + image_filename, header.image_filename_size);
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> +
> + ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> +
> + ret = bdrv_truncate(bs, image_len);
> + bdrv_delete(bs);
> + return ret;
> +}
> +
> +static int add_cow_open(BlockDriverState *bs, int flags)
> +{
> + char image_filename[ADD_COW_FILE_LEN];
> + char tmp_name[ADD_COW_FILE_LEN];
> + BlockDriver *image_drv = NULL;
> + int ret;
> + int sector_per_byte;
> + BDRVAddCowState *s = bs->opaque;
> + AddCowHeader le_header;
> +
> + ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
> + if (ret != sizeof(s->header)) {
> + goto fail;
> + }
> +
> + add_cow_header_le_to_cpu(&le_header, &s->header);
> +
> + if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
> + ret = -EINVAL;
> + goto fail;
> + }
> +
> + if (s->header.version != ADD_COW_VERSION) {
> + char version[64];
> + snprintf(version, sizeof(version), "ADD-COW version %d",
> + s->header.version);
> + qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> + bs->device_name, "add-cow", version);
> + ret = -ENOTSUP;
> + goto fail;
> + }
> +
> + if (s->header.features & ~ADD_COW_FEATURE_MASK) {
> + char buf[64];
> + snprintf(buf, sizeof(buf), "%" PRIx64,
> + s->header.features & ~ADD_COW_FEATURE_MASK);
> + qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> + bs->device_name, "add-cow", buf);
> + return -ENOTSUP;
> + }
> +
> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> + ret = bdrv_read_string(bs->file, sizeof(s->header),
> + sizeof(s->backing_file_format) - 1, s->backing_file_format,
> + sizeof(s->backing_file_format));
> + if (ret < 0) {
> + goto fail;
> + }
> + }
> +
> + ret = bdrv_read_string(bs->file,
> + sizeof(s->header) + sizeof(s->image_file_format),
> + sizeof(s->image_file_format) - 1, s->image_file_format,
> + sizeof(s->image_file_format));
> + if (ret < 0) {
> + goto fail;
> + }
> +
> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> + ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
> + s->header.backing_filename_size, bs->backing_file,
> + sizeof(bs->backing_file));
> + if (ret < 0) {
> + goto fail;
> + }
> + }
> +
> + ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
> + s->header.image_filename_size, tmp_name,
> + sizeof(tmp_name));
> + if (ret < 0) {
> + goto fail;
> + }
> +
> + s->image_hd = bdrv_new("");
> + if (path_has_protocol(image_filename)) {
> + pstrcpy(image_filename, sizeof(image_filename), tmp_name);
> + } else {
> + path_combine(image_filename, sizeof(image_filename),
> + bs->filename, tmp_name);
> + }
> +
> + ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
> + if (ret < 0) {
> + bdrv_delete(s->image_hd);
> + goto fail;
> + }
> +
> + bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
> + s->cluster_size = ADD_COW_CLUSTER_SIZE;
> + sector_per_byte = SECTORS_PER_CLUSTER * 8;
> + s->bitmap_size =
> + (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
> + s->bitmap_cache =
> + block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
> +
> + qemu_co_mutex_init(&s->lock);
> + return 0;
> +fail:
> + if (s->bitmap_cache) {
> + block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> + }
> + return ret;
> +}
> +
> +static void add_cow_close(BlockDriverState *bs)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> + bdrv_delete(s->image_hd);
> +}
> +
> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + BlockCache *c = s->bitmap_cache;
> + int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> + uint8_t *table = NULL;
> + uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> + + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
> + int ret = block_cache_get(bs, s->bitmap_cache, offset,
> + (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
> +
> + if (ret < 0) {
> + return ret;
> + }
> + return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
> + & (1 << (cluster_num % 8));
> +}
> +
> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
> + int64_t sector_num, int nb_sectors, int *num_same)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + int changed;
> +
> + if (nb_sectors == 0) {
> + *num_same = 0;
> + return 0;
> + }
> +
> + if (s->header.features & ADD_COW_F_All_ALLOCATED) {
> + *num_same = nb_sectors - 1;
> + return 1;
> + }
> + changed = is_allocated(bs, sector_num);
> +
> + for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
> + if (is_allocated(bs, sector_num + *num_same) != changed) {
> + break;
> + }
> + }
> + return changed;
> +}
> +
> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
> + int64_t sector_num, int nb_sectors)
> +{
> + int n1;
> + if ((sector_num + nb_sectors) <= bs->total_sectors) {
> + return nb_sectors;
> + }
> + if (sector_num >= bs->total_sectors) {
> + n1 = 0;
> + } else {
> + n1 = bs->total_sectors - sector_num;
> + }
> +
> + qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
> + 0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
> +
> + return n1;
> +}
> +
> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
> + int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + int cur_nr_sectors;
> + uint64_t bytes_done = 0;
> + QEMUIOVector hd_qiov;
> + int n, n1, ret = 0;
> +
> + qemu_iovec_init(&hd_qiov, qiov->niov);
> + qemu_co_mutex_lock(&s->lock);
> + while (remaining_sectors != 0) {
> + cur_nr_sectors = remaining_sectors;
> + if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
> + cur_nr_sectors = n;
> + qemu_iovec_reset(&hd_qiov);
> + qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> + cur_nr_sectors * BDRV_SECTOR_SIZE);
> + qemu_co_mutex_unlock(&s->lock);
> + ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
> + qemu_co_mutex_lock(&s->lock);
> + if (ret < 0) {
> + goto fail;
> + }
> + } else {
> + cur_nr_sectors = n;
> + if (bs->backing_hd) {
> + qemu_iovec_reset(&hd_qiov);
> + qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> + cur_nr_sectors * BDRV_SECTOR_SIZE);
> + n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
> + sector_num, cur_nr_sectors);
> + if (n1 > 0) {
> + qemu_co_mutex_unlock(&s->lock);
> + ret = bdrv_co_readv(bs->backing_hd, sector_num,
> + n, &hd_qiov);
> + qemu_co_mutex_lock(&s->lock);
> + if (ret < 0) {
> + goto fail;
> + }
> + }
> + } else {
> + qemu_iovec_memset(&hd_qiov, 0, 0,
> + BDRV_SECTOR_SIZE * cur_nr_sectors);
> + }
> + }
> + remaining_sectors -= cur_nr_sectors;
> + sector_num += cur_nr_sectors;
> + bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
> + }
> +fail:
> + qemu_co_mutex_unlock(&s->lock);
> + qemu_iovec_destroy(&hd_qiov);
> + return ret;
> +}
> +
> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
> + int n_start, int n_end)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + QEMUIOVector qiov;
> + struct iovec iov;
> + int n, ret;
> +
> + n = n_end - n_start;
> + if (n <= 0) {
> + return 0;
> + }
> +
> + iov.iov_len = n * BDRV_SECTOR_SIZE;
> + iov.iov_base = qemu_blockalign(bs, iov.iov_len);
> +
> + qemu_iovec_init_external(&qiov, &iov, 1);
> +
> + ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
> + if (ret < 0) {
> + goto out;
> + }
> + ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
> + if (ret < 0) {
> + goto out;
> + }
> +
> + ret = 0;
> +out:
> + qemu_vfree(iov.iov_base);
> + return ret;
> +}
> +
> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
> + int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + BlockCache *c = s->bitmap_cache;
> + int ret = 0, i;
> + QEMUIOVector hd_qiov;
> + uint8_t *table;
> + uint64_t offset;
> +
> + qemu_co_mutex_lock(&s->lock);
> + qemu_iovec_init(&hd_qiov, qiov->niov);
> + ret = bdrv_co_writev(s->image_hd,
> + sector_num,
> + remaining_sectors, qiov);
alignment ^
or even at ^ if you prefer and have done in some places, just need to be
consistent about it for better readability.
> +
> + if (ret < 0) {
> + goto fail;
> + }
> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> + /* Copy content of unmodified sectors */
> + if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
Why do we avoid a COW when writing to the first sector of a cluster?
> + ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
> + sector_num);
> + if (ret < 0) {
> + goto fail;
> + }
> + }
> +
> + if (!is_cluster_tail(sector_num + remaining_sectors - 1)
> + && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
> + ret = copy_sectors(bs, sector_num + remaining_sectors,
> + ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
> + if (ret < 0) {
> + goto fail;
> + }
> + }
> +
> + for (i = sector_num / SECTORS_PER_CLUSTER;
> + i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
> + i++) {
> + offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> + + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
> + ret = block_cache_get(bs, s->bitmap_cache, offset,
> + (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
> + if (ret < 0) {
> + goto fail;
> + }
> + if ((table[i / 8] & (1 << (i % 8))) == 0) {
> + table[i / 8] |= (1 << (i % 8));
> + block_cache_entry_mark_dirty(s->bitmap_cache, table);
> + }
> + }
> + }
> + ret = 0;
> +fail:
> + qemu_co_mutex_unlock(&s->lock);
> + qemu_iovec_destroy(&hd_qiov);
> + return ret;
> +}
> +
> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + int sector_per_byte = SECTORS_PER_CLUSTER * 8;
> + int ret;
> + uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
> + int64_t bitmap_size =
> + (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
> + bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
> + & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
> +
> + ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
> + if (ret < 0) {
> + return ret;
> + }
> + return 0;
> +}
> +
> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + int ret;
> +
> + qemu_co_mutex_lock(&s->lock);
> + ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
> + ADD_COW_CACHE_ENTRY_SIZE);
> + qemu_co_mutex_unlock(&s->lock);
> + return ret;
> +}
> +
> +static QEMUOptionParameter add_cow_create_options[] = {
> + {
> + .name = BLOCK_OPT_SIZE,
> + .type = OPT_SIZE,
> + .help = "Virtual disk size"
> + },
> + {
> + .name = BLOCK_OPT_BACKING_FILE,
> + .type = OPT_STRING,
> + .help = "File name of a base image"
> + },
> + {
> + .name = BLOCK_OPT_BACKING_FMT,
> + .type = OPT_STRING,
> + .help = "Image format of the base image"
> + },
> + {
> + .name = BLOCK_OPT_IMAGE_FILE,
> + .type = OPT_STRING,
> + .help = "File name of a image file"
> + },
> + {
> + .name = BLOCK_OPT_IMAGE_FORMAT,
> + .type = OPT_STRING,
> + .help = "Image format of the image file"
> + },
> + { NULL }
> +};
> +
> +static BlockDriver bdrv_add_cow = {
> + .format_name = "add-cow",
> + .instance_size = sizeof(BDRVAddCowState),
> + .bdrv_probe = add_cow_probe,
> + .bdrv_open = add_cow_open,
> + .bdrv_close = add_cow_close,
> + .bdrv_create = add_cow_create,
> + .bdrv_co_readv = add_cow_co_readv,
> + .bdrv_co_writev = add_cow_co_writev,
> + .bdrv_truncate = bdrv_add_cow_truncate,
> + .bdrv_co_is_allocated = add_cow_is_allocated,
> +
> + .create_options = add_cow_create_options,
> + .bdrv_co_flush_to_os = add_cow_co_flush,
> +};
> +
> +static void bdrv_add_cow_init(void)
> +{
> + bdrv_register(&bdrv_add_cow);
> +}
> +
> +block_init(bdrv_add_cow_init);
> diff --git a/block/add-cow.h b/block/add-cow.h
> new file mode 100644
> index 0000000..f058376
> --- /dev/null
> +++ b/block/add-cow.h
> @@ -0,0 +1,85 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#ifndef BLOCK_ADD_COW_H
> +#define BLOCK_ADD_COW_H
> +#include "block-cache.h"
> +
> +enum {
> + ADD_COW_F_All_ALLOCATED = 0X01,
Please use "ADD_COW_F_ALL_ALLOCATED" (all caps)
was searching your patch for how this was used and was scratching my
head when I wasn't seeing any matches :)
> + ADD_COW_FEATURE_MASK = ADD_COW_F_All_ALLOCATED,
> +
> + ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
> + ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
> + ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
> + ((uint64_t)'W' << 8) | 0xFF),
> + ADD_COW_VERSION = 1,
> + ADD_COW_FILE_LEN = 1024,
> + ADD_COW_CACHE_SIZE = 16,
> + ADD_COW_CACHE_ENTRY_SIZE = 65536,
> + ADD_COW_CLUSTER_SIZE = 65536,
> + SECTORS_PER_CLUSTER = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
> + ADD_COW_PAGE_SIZE = 4096,
> + ADD_COW_DEFAULT_PAGE_SIZE = 1,
> +};
> +
> +typedef struct AddCowHeader {
> + uint64_t magic;
> + uint32_t version;
> +
> + uint32_t backing_filename_offset;
> + uint32_t backing_filename_size;
> +
> + uint32_t image_filename_offset;
> + uint32_t image_filename_size;
> +
> + uint64_t features;
> + uint64_t optional_features;
> + uint32_t header_pages_size;
> +} QEMU_PACKED AddCowHeader;
You should avoid using packed structures for image format headers.
Instead, I would either:
a) re-order the fields so that 32/64-bit fields, respectively, fall on
32/64-bit boundaries (in your case, for instance, moving header_pages_size
above features) like qed/qcow2 do, or
b) read/write the fields individually rather than reading/writing directly
into/from the header struct.
The safest route is b). Adds a few lines of code, but you won't have to
re-work things (or worry about introducing bugs) later if you were to add,
say, a 32-bit value, and then a 64-bit value later.
> +
> +typedef struct BDRVAddCowState {
> + BlockDriverState *image_hd;
> + CoMutex lock;
> + int cluster_size;
> + BlockCache *bitmap_cache;
> + uint64_t bitmap_size;
> + AddCowHeader header;
> + char backing_file_format[16];
> + char image_file_format[16];
> +} BDRVAddCowState;
> +
> +/* Convert sector_num to offset in bitmap */
> +static inline int64_t offset_in_bitmap(int64_t sector_num)
> +{
> + int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> + return cluster_num / 8;
> +}
> +
> +static inline bool is_cluster_head(int64_t sector_num)
> +{
> + return sector_num % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +static inline bool is_cluster_tail(int64_t sector_num)
> +{
> + return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
> + void **table);
> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
> +#endif
> diff --git a/block_int.h b/block_int.h
> index 6c1d9ca..67954ec 100644
> --- a/block_int.h
> +++ b/block_int.h
> @@ -53,6 +53,8 @@
> #define BLOCK_OPT_SUBFMT "subformat"
> #define BLOCK_OPT_COMPAT_LEVEL "compat"
> #define BLOCK_OPT_LAZY_REFCOUNTS "lazy_refcounts"
> +#define BLOCK_OPT_IMAGE_FILE "image_file"
> +#define BLOCK_OPT_IMAGE_FORMAT "image_format"
>
> typedef struct BdrvTrackedRequest BdrvTrackedRequest;
>
> --
> 1.7.1
>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
2012-09-06 17:27 ` Michael Roth
@ 2012-09-10 1:48 ` Dong Xu Wang
0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10 1:48 UTC (permalink / raw)
To: Michael Roth; +Cc: kwolf, qemu-devel
On Fri, Sep 7, 2012 at 1:27 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:40PM +0800, Dong Xu Wang wrote:
>> Document for add-cow format, the usage and spec of add-cow are introduced.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>> docs/specs/add-cow.txt | 123 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 123 insertions(+), 0 deletions(-)
>> create mode 100644 docs/specs/add-cow.txt
>>
>> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
>> new file mode 100644
>> index 0000000..d5a7a68
>> --- /dev/null
>> +++ b/docs/specs/add-cow.txt
>> @@ -0,0 +1,123 @@
>> +== General ==
>> +
>> +The raw file format does not support backing files or copy on write feature.
>> +The add-cow image format makes it possible to use backing files with raw
>> +image by keeping a separate .add-cow metadata file. Once all sectors
>> +have been written into the raw image it is safe to discard the .add-cow
>> +and backing files, then we can use the raw image directly.
>> +
>> +An example usage of add-cow would look like::
>> +(ubuntu.img is a disk image which has been installed OS.)
>> + 1) Create a raw image with the same size of ubuntu.img
>> + qemu-img create -f raw test.raw 8G
>> + 2) Create an add-cow image which will store dirty bitmap
>> + qemu-img create -f add-cow test.add-cow \
>> + -o backing_file=ubuntu.img,image_file=test.raw
>> + 3) Run qemu with add-cow image
>> + qemu -drive if=virtio,file=test.add-cow
>> +
>> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
>> +will be calculated from the size of test.raw.
>> +
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +---------------+-------------+-----------------+
>> + | Header | Reserved | COW bitmap |
>> + +---------------+-------------+-----------------+
>> +
>> +All numbers in add-cow are stored in Little Endian byte order.
>> +
>> +== Header ==
>> +
>> +The Header is included in the first bytes:
>> +(#define HEADER_SIZE (4096 * header_pages_size))
>> + Byte 0 - 7: magic
>> + add-cow magic string ("ADD_COW\xff").
>> +
>> + 8 - 11: version
>> + Version number (only valid value is 1 now).
>> +
>> + 12 - 15: backing file name offset
>> + Offset in the add-cow file at which the backing file
>> + name is stored (NB: The string is not nul-terminated).
>> + If backing file name does NOT exist, this field will be
>> + 0. Must be between 80 and [HEADER_SIZE - 2](a file name
>> + must be at least 1 byte).
>> +
>> + 16 - 19: backing file name size
>> + Length of the backing file name in bytes. It will be 0
>> + if the backing file name offset is 0. If backing file
>> + name offset is non-zero, then it must be non-zero. Must
>> + be less than [HEADER_SIZE - 80] to fit in the reserved
>> + part of the header.
>> +
>> + 20 - 23: image file name offset
>> + Offset in the add-cow file at which the image file name
>> + is stored (NB: The string is not null terminated). It
>> + must be between 80 and [HEADER_SIZE - 2].
>> +
>> + 24 - 27: image file name size
>> + Length of the image file name in bytes.
>> + Must be less than [HEADER_SIZE - 80] to fit in the reserved
>> + part of the header.
>> +
>> + 28 - 35: features
>> + Currently only 1 feature bit is used:
>> + Feature bits:
>> + * ADD_COW_F_All_ALLOCATED = 0x01.
>> +
>> + 36 - 43: optional features
>> + Not used now. Reserved for future use. It must be set to 0.
>> +
>> + 44 - 47: header pages size
>> + The header field is variable-sized. This field indicates
>> + how many pages(4k) will be used to store add-cow header.
>> + In add-cow v1, it is fixed to 1, so the header size will
>> + be 4k * 1 = 4096 bytes.
>> +
>> + 48 - 63: backing file format
>> + format of backing file. It will be filled with 0 if
>> + backing file name offset is 0. If backing file name
>> + offset is non-zero, it must be non-zero. It is coded
>> + in free-form ASCII, and is not NUL-terminated.
>> +
>> + 64 - 79: image file format
>> + format of image file. It must be non-zero. It is coded
>> + in free-form ASCII, and is not NUL-terminated.
>> +
>> + 80 - [HEADER_SIZE - 1]:
>> + It is used to make sure COW bitmap field starts at the
>> + HEADER_SIZE byte, backing file name and image file name
>> + will be stored here. The bytes that is not pointing to
>> + backing file and image file names will bet set to 0.
>> +
>> +== COW bitmap ==
>> +
>> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
>> +backing file and image file. The bitmap will track whether the sector in
>> +backing file is dirty or not.
>> +
>> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
>> +sectors, then each bit indicates 512 * 128 = 64k bytes. the size of bitmap is
>> +calculated according to virtual size of image file, and it also should be multipe
>> +of 65536, the bits not used will be set to 0. Within each byte, the least
>> +significant bit covers the first cluster. Bit orders in one byte look like:
>> + +----+----+----+----+----+----+----+----+
>> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
>> + +----+----+----+----+----+----+----+----+
>> +
>> +If the bit is 0, indicates the sector has not been allocated in image file, data
>> +should be loaded from backing file while reading; if the bit is 1, indicates the
>> +related sector has been dirty, should be loaded from image file while reading.
>> +Writing to a sector causes the corresponding bit to be set to 1.
>> +
>> +If raw image is not an even multiple of cluster bytes, bits that correspond to
>> +bytes beyond the raw file size in add-cow will be 0.
>> +
>> +Image file name and backing file name must NOT be the same, we prevent this
>> +while creating add-cow files.
>> +
>> +Image file and backing file are interpreted relative to the qcow2 file, not
>
> Relative to the add-cow file?
Ah, yes..
>
>> +to the current working directory of the process that opened the qcow2 file.
>> --
>> 1.7.1
>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string
2012-09-06 17:32 ` Michael Roth
@ 2012-09-10 1:49 ` Dong Xu Wang
0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10 1:49 UTC (permalink / raw)
To: Michael Roth; +Cc: kwolf, qemu-devel
On Fri, Sep 7, 2012 at 1:32 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:42PM +0800, Dong Xu Wang wrote:
>> Make qed_read_string function to a common interface, so move it to block.c.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>> block.c | 27 +++++++++++++++++++++++++++
>> block.h | 2 ++
>> block/qed.c | 29 +----------------------------
>> 3 files changed, 30 insertions(+), 28 deletions(-)
>>
>> diff --git a/block.c b/block.c
>> index c13d803..d906b35 100644
>> --- a/block.c
>> +++ b/block.c
>> @@ -213,6 +213,33 @@ int path_has_protocol(const char *path)
>> return *p == ':';
>> }
>>
>> +/**
>> + * Read a string of known length from the image file
>> + *
>> + * @bs: Image file
>> + * @offset: File offset to start of string, in bytes
>> + * @n: String length in bytes
>> + * @buf: Destination buffer
>> + * @buflen: Destination buffer length in bytes
>> + * @ret: 0 on success, -errno on failure
>> + *
>> + * The string is NUL-terminated.
>> + */
>> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
>> + char *buf, size_t buflen)
>
> Small alignment issue ^
>
>> +{
>> + int ret;
>> + if (n >= buflen) {
>> + return -EINVAL;
>> + }
>> + ret = bdrv_pread(bs, offset, buf, n);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + buf[n] = '\0';
>> + return 0;
>> +}
>> +
>> int path_is_absolute(const char *path)
>> {
>> #ifdef _WIN32
>> diff --git a/block.h b/block.h
>> index 54e61c9..e5dfcd7 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -154,6 +154,8 @@ int bdrv_pwrite_sync(BlockDriverState *bs, int64_t offset,
>> const void *buf, int count);
>> int coroutine_fn bdrv_co_readv(BlockDriverState *bs, int64_t sector_num,
>> int nb_sectors, QEMUIOVector *qiov);
>> +int bdrv_read_string(BlockDriverState *bs, uint64_t offset, size_t n,
>> + char *buf, size_t buflen);
>
> Another one here ^
>
>> int coroutine_fn bdrv_co_copy_on_readv(BlockDriverState *bs,
>> int64_t sector_num, int nb_sectors, QEMUIOVector *qiov);
>> int coroutine_fn bdrv_co_writev(BlockDriverState *bs, int64_t sector_num,
>> diff --git a/block/qed.c b/block/qed.c
>> index 5f3eefa..311c589 100644
>> --- a/block/qed.c
>> +++ b/block/qed.c
>> @@ -217,33 +217,6 @@ static bool qed_is_image_size_valid(uint64_t image_size, uint32_t cluster_size,
>> }
>>
>> /**
>> - * Read a string of known length from the image file
>> - *
>> - * @file: Image file
>> - * @offset: File offset to start of string, in bytes
>> - * @n: String length in bytes
>> - * @buf: Destination buffer
>> - * @buflen: Destination buffer length in bytes
>> - * @ret: 0 on success, -errno on failure
>> - *
>> - * The string is NUL-terminated.
>> - */
>> -static int qed_read_string(BlockDriverState *file, uint64_t offset, size_t n,
>> - char *buf, size_t buflen)
>> -{
>> - int ret;
>> - if (n >= buflen) {
>> - return -EINVAL;
>> - }
>> - ret = bdrv_pread(file, offset, buf, n);
>> - if (ret < 0) {
>> - return ret;
>> - }
>> - buf[n] = '\0';
>> - return 0;
>> -}
>> -
>> -/**
>> * Allocate new clusters
>> *
>> * @s: QED state
>> @@ -437,7 +410,7 @@ static int bdrv_qed_open(BlockDriverState *bs, int flags)
>> return -EINVAL;
>> }
>>
>> - ret = qed_read_string(bs->file, s->header.backing_filename_offset,
>> + ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>> s->header.backing_filename_size, bs->backing_file,
>> sizeof(bs->backing_file));
>
> Here too ^
>
> Looks good otherwise.
>
>> if (ret < 0) {
>> --
>> 1.7.1
>>
>>
>
Thank you Michael .
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
2012-09-06 17:52 ` Michael Roth
@ 2012-09-10 2:14 ` Dong Xu Wang
0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10 2:14 UTC (permalink / raw)
To: Michael Roth; +Cc: kwolf, qemu-devel
On Fri, Sep 7, 2012 at 1:52 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:43PM +0800, Dong Xu Wang wrote:
>> add-cow and qcow2 file format will share the same cache code, so rename
>> block-cache.c to block-cache.c. And related structure and qcow2 code also
>
> "qcow2-cache.c to block-cache.c"
>
> But I've scanned through the rest of your patches and can't seem to find
> where block-cache.c gets introduced. Did you forget to git add it?
Really sorry for that, I forget to add the block-cache.c, will add it in v13.
>
>> are changed.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>> block.h | 3 +
>> block/Makefile.objs | 3 +-
>> block/qcow2-cache.c | 323 ------------------------------------------------
>> block/qcow2-cluster.c | 66 ++++++----
>> block/qcow2-refcount.c | 66 ++++++-----
>> block/qcow2.c | 36 +++---
>> block/qcow2.h | 24 +---
>> trace-events | 13 +-
>> 8 files changed, 109 insertions(+), 425 deletions(-)
>> delete mode 100644 block/qcow2-cache.c
>>
>> diff --git a/block.h b/block.h
>> index e5dfcd7..c325661 100644
>> --- a/block.h
>> +++ b/block.h
>> @@ -401,6 +401,9 @@ typedef enum {
>> BLKDBG_CLUSTER_ALLOC_BYTES,
>> BLKDBG_CLUSTER_FREE,
>>
>> + BLKDBG_ADD_COW_UPDATE,
>> + BLKDBG_ADD_COW_LOAD,
>> +
>> BLKDBG_EVENT_MAX,
>> } BlkDebugEvent;
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index b5754d3..23bdfc8 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -1,7 +1,8 @@
>> block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat.o
>> -block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o
>> +block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>> block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>> block-obj-y += qed-check.o
>> +block-obj-y += block-cache.o
>> block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>> block-obj-y += stream.o
>> block-obj-$(CONFIG_WIN32) += raw-win32.o
>> diff --git a/block/qcow2-cache.c b/block/qcow2-cache.c
>> deleted file mode 100644
>> index 2d4322a..0000000
>> --- a/block/qcow2-cache.c
>> +++ /dev/null
>> @@ -1,323 +0,0 @@
>> -/*
>> - * L2/refcount table cache for the QCOW2 format
>> - *
>> - * Copyright (c) 2010 Kevin Wolf <kwolf@redhat.com>
>> - *
>> - * Permission is hereby granted, free of charge, to any person obtaining a copy
>> - * of this software and associated documentation files (the "Software"), to deal
>> - * in the Software without restriction, including without limitation the rights
>> - * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
>> - * copies of the Software, and to permit persons to whom the Software is
>> - * furnished to do so, subject to the following conditions:
>> - *
>> - * The above copyright notice and this permission notice shall be included in
>> - * all copies or substantial portions of the Software.
>> - *
>> - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
>> - * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
>> - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
>> - * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
>> - * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
>> - * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
>> - * THE SOFTWARE.
>> - */
>> -
>> -#include "block_int.h"
>> -#include "qemu-common.h"
>> -#include "qcow2.h"
>> -#include "trace.h"
>> -
>> -typedef struct Qcow2CachedTable {
>> - void* table;
>> - int64_t offset;
>> - bool dirty;
>> - int cache_hits;
>> - int ref;
>> -} Qcow2CachedTable;
>> -
>> -struct Qcow2Cache {
>> - Qcow2CachedTable* entries;
>> - struct Qcow2Cache* depends;
>> - int size;
>> - bool depends_on_flush;
>> -};
>> -
>> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables)
>> -{
>> - BDRVQcowState *s = bs->opaque;
>> - Qcow2Cache *c;
>> - int i;
>> -
>> - c = g_malloc0(sizeof(*c));
>> - c->size = num_tables;
>> - c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
>> -
>> - for (i = 0; i < c->size; i++) {
>> - c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
>> - }
>> -
>> - return c;
>> -}
>> -
>> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c)
>> -{
>> - int i;
>> -
>> - for (i = 0; i < c->size; i++) {
>> - assert(c->entries[i].ref == 0);
>> - qemu_vfree(c->entries[i].table);
>> - }
>> -
>> - g_free(c->entries);
>> - g_free(c);
>> -
>> - return 0;
>> -}
>> -
>> -static int qcow2_cache_flush_dependency(BlockDriverState *bs, Qcow2Cache *c)
>> -{
>> - int ret;
>> -
>> - ret = qcow2_cache_flush(bs, c->depends);
>> - if (ret < 0) {
>> - return ret;
>> - }
>> -
>> - c->depends = NULL;
>> - c->depends_on_flush = false;
>> -
>> - return 0;
>> -}
>> -
>> -static int qcow2_cache_entry_flush(BlockDriverState *bs, Qcow2Cache *c, int i)
>> -{
>> - BDRVQcowState *s = bs->opaque;
>> - int ret = 0;
>> -
>> - if (!c->entries[i].dirty || !c->entries[i].offset) {
>> - return 0;
>> - }
>> -
>> - trace_qcow2_cache_entry_flush(qemu_coroutine_self(),
>> - c == s->l2_table_cache, i);
>> -
>> - if (c->depends) {
>> - ret = qcow2_cache_flush_dependency(bs, c);
>> - } else if (c->depends_on_flush) {
>> - ret = bdrv_flush(bs->file);
>> - if (ret >= 0) {
>> - c->depends_on_flush = false;
>> - }
>> - }
>> -
>> - if (ret < 0) {
>> - return ret;
>> - }
>> -
>> - if (c == s->refcount_block_cache) {
>> - BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_UPDATE_PART);
>> - } else if (c == s->l2_table_cache) {
>> - BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE);
>> - }
>> -
>> - ret = bdrv_pwrite(bs->file, c->entries[i].offset, c->entries[i].table,
>> - s->cluster_size);
>> - if (ret < 0) {
>> - return ret;
>> - }
>> -
>> - c->entries[i].dirty = false;
>> -
>> - return 0;
>> -}
>> -
>> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c)
>> -{
>> - BDRVQcowState *s = bs->opaque;
>> - int result = 0;
>> - int ret;
>> - int i;
>> -
>> - trace_qcow2_cache_flush(qemu_coroutine_self(), c == s->l2_table_cache);
>> -
>> - for (i = 0; i < c->size; i++) {
>> - ret = qcow2_cache_entry_flush(bs, c, i);
>> - if (ret < 0 && result != -ENOSPC) {
>> - result = ret;
>> - }
>> - }
>> -
>> - if (result == 0) {
>> - ret = bdrv_flush(bs->file);
>> - if (ret < 0) {
>> - result = ret;
>> - }
>> - }
>> -
>> - return result;
>> -}
>> -
>> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
>> - Qcow2Cache *dependency)
>> -{
>> - int ret;
>> -
>> - if (dependency->depends) {
>> - ret = qcow2_cache_flush_dependency(bs, dependency);
>> - if (ret < 0) {
>> - return ret;
>> - }
>> - }
>> -
>> - if (c->depends && (c->depends != dependency)) {
>> - ret = qcow2_cache_flush_dependency(bs, c);
>> - if (ret < 0) {
>> - return ret;
>> - }
>> - }
>> -
>> - c->depends = dependency;
>> - return 0;
>> -}
>> -
>> -void qcow2_cache_depends_on_flush(Qcow2Cache *c)
>> -{
>> - c->depends_on_flush = true;
>> -}
>> -
>> -static int qcow2_cache_find_entry_to_replace(Qcow2Cache *c)
>> -{
>> - int i;
>> - int min_count = INT_MAX;
>> - int min_index = -1;
>> -
>> -
>> - for (i = 0; i < c->size; i++) {
>> - if (c->entries[i].ref) {
>> - continue;
>> - }
>> -
>> - if (c->entries[i].cache_hits < min_count) {
>> - min_index = i;
>> - min_count = c->entries[i].cache_hits;
>> - }
>> -
>> - /* Give newer hits priority */
>> - /* TODO Check how to optimize the replacement strategy */
>> - c->entries[i].cache_hits /= 2;
>> - }
>> -
>> - if (min_index == -1) {
>> - /* This can't happen in current synchronous code, but leave the check
>> - * here as a reminder for whoever starts using AIO with the cache */
>> - abort();
>> - }
>> - return min_index;
>> -}
>> -
>> -static int qcow2_cache_do_get(BlockDriverState *bs, Qcow2Cache *c,
>> - uint64_t offset, void **table, bool read_from_disk)
>> -{
>> - BDRVQcowState *s = bs->opaque;
>> - int i;
>> - int ret;
>> -
>> - trace_qcow2_cache_get(qemu_coroutine_self(), c == s->l2_table_cache,
>> - offset, read_from_disk);
>> -
>> - /* Check if the table is already cached */
>> - for (i = 0; i < c->size; i++) {
>> - if (c->entries[i].offset == offset) {
>> - goto found;
>> - }
>> - }
>> -
>> - /* If not, write a table back and replace it */
>> - i = qcow2_cache_find_entry_to_replace(c);
>> - trace_qcow2_cache_get_replace_entry(qemu_coroutine_self(),
>> - c == s->l2_table_cache, i);
>> - if (i < 0) {
>> - return i;
>> - }
>> -
>> - ret = qcow2_cache_entry_flush(bs, c, i);
>> - if (ret < 0) {
>> - return ret;
>> - }
>> -
>> - trace_qcow2_cache_get_read(qemu_coroutine_self(),
>> - c == s->l2_table_cache, i);
>> - c->entries[i].offset = 0;
>> - if (read_from_disk) {
>> - if (c == s->l2_table_cache) {
>> - BLKDBG_EVENT(bs->file, BLKDBG_L2_LOAD);
>> - }
>> -
>> - ret = bdrv_pread(bs->file, offset, c->entries[i].table, s->cluster_size);
>> - if (ret < 0) {
>> - return ret;
>> - }
>> - }
>> -
>> - /* Give the table some hits for the start so that it won't be replaced
>> - * immediately. The number 32 is completely arbitrary. */
>> - c->entries[i].cache_hits = 32;
>> - c->entries[i].offset = offset;
>> -
>> - /* And return the right table */
>> -found:
>> - c->entries[i].cache_hits++;
>> - c->entries[i].ref++;
>> - *table = c->entries[i].table;
>> -
>> - trace_qcow2_cache_get_done(qemu_coroutine_self(),
>> - c == s->l2_table_cache, i);
>> -
>> - return 0;
>> -}
>> -
>> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> - void **table)
>> -{
>> - return qcow2_cache_do_get(bs, c, offset, table, true);
>> -}
>> -
>> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> - void **table)
>> -{
>> - return qcow2_cache_do_get(bs, c, offset, table, false);
>> -}
>> -
>> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table)
>> -{
>> - int i;
>> -
>> - for (i = 0; i < c->size; i++) {
>> - if (c->entries[i].table == *table) {
>> - goto found;
>> - }
>> - }
>> - return -ENOENT;
>> -
>> -found:
>> - c->entries[i].ref--;
>> - *table = NULL;
>> -
>> - assert(c->entries[i].ref >= 0);
>> - return 0;
>> -}
>> -
>> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table)
>> -{
>> - int i;
>> -
>> - for (i = 0; i < c->size; i++) {
>> - if (c->entries[i].table == table) {
>> - goto found;
>> - }
>> - }
>> - abort();
>> -
>> -found:
>> - c->entries[i].dirty = true;
>> -}
>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>> index e179211..335dc7a 100644
>> --- a/block/qcow2-cluster.c
>> +++ b/block/qcow2-cluster.c
>> @@ -28,6 +28,7 @@
>> #include "block_int.h"
>> #include "block/qcow2.h"
>> #include "trace.h"
>> +#include "block-cache.h"
>>
>> int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>> {
>> @@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
>> return new_l1_table_offset;
>> }
>>
>> - ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> + ret = block_cache_flush(bs, s->refcount_block_cache,
>> + BLOCK_TABLE_REF, s->cluster_size);
>> if (ret < 0) {
>> goto fail;
>> }
>> @@ -119,7 +121,8 @@ static int l2_load(BlockDriverState *bs, uint64_t l2_offset,
>> BDRVQcowState *s = bs->opaque;
>> int ret;
>>
>> - ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset, (void**) l2_table);
>> + ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
>> + (void **) l2_table, BLOCK_TABLE_L2, s->cluster_size);
>>
>> return ret;
>> }
>> @@ -180,7 +183,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>> return l2_offset;
>> }
>>
>> - ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> + ret = block_cache_flush(bs, s->refcount_block_cache,
>> + BLOCK_TABLE_REF, s->cluster_size);
>> if (ret < 0) {
>> goto fail;
>> }
>> @@ -188,7 +192,8 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>> /* allocate a new entry in the l2 cache */
>>
>> trace_qcow2_l2_allocate_get_empty(bs, l1_index);
>> - ret = qcow2_cache_get_empty(bs, s->l2_table_cache, l2_offset, (void**) table);
>> + ret = block_cache_get_empty(bs, s->l2_table_cache, l2_offset,
>> + (void **) table, BLOCK_TABLE_L2, s->cluster_size);
>> if (ret < 0) {
>> return ret;
>> }
>> @@ -203,16 +208,17 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>
>> /* if there was an old l2 table, read it from the disk */
>> BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_COW_READ);
>> - ret = qcow2_cache_get(bs, s->l2_table_cache,
>> + ret = block_cache_get(bs, s->l2_table_cache,
>> old_l2_offset & L1E_OFFSET_MASK,
>> - (void**) &old_table);
>> + (void **) &old_table, BLOCK_TABLE_L2, s->cluster_size);
>> if (ret < 0) {
>> goto fail;
>> }
>>
>> memcpy(l2_table, old_table, s->cluster_size);
>>
>> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &old_table);
>> + ret = block_cache_put(bs, s->l2_table_cache,
>> + (void **) &old_table, BLOCK_TABLE_L2);
>> if (ret < 0) {
>> goto fail;
>> }
>> @@ -222,8 +228,9 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>> BLKDBG_EVENT(bs->file, BLKDBG_L2_ALLOC_WRITE);
>>
>> trace_qcow2_l2_allocate_write_l2(bs, l1_index);
>> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> - ret = qcow2_cache_flush(bs, s->l2_table_cache);
>> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> + ret = block_cache_flush(bs, s->l2_table_cache,
>> + BLOCK_TABLE_L2, s->cluster_size);
>> if (ret < 0) {
>> goto fail;
>> }
>> @@ -242,7 +249,7 @@ static int l2_allocate(BlockDriverState *bs, int l1_index, uint64_t **table)
>>
>> fail:
>> trace_qcow2_l2_allocate_done(bs, l1_index, ret);
>> - qcow2_cache_put(bs, s->l2_table_cache, (void**) table);
>> + block_cache_put(bs, s->l2_table_cache, (void **) table, BLOCK_TABLE_L2);
>> s->l1_table[l1_index] = old_l2_offset;
>> return ret;
>> }
>> @@ -475,7 +482,7 @@ int qcow2_get_cluster_offset(BlockDriverState *bs, uint64_t offset,
>> abort();
>> }
>>
>> - qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> + block_cache_put(bs, s->l2_table_cache, (void **) &l2_table, BLOCK_TABLE_L2);
>>
>> nb_available = (c * s->cluster_sectors);
>>
>> @@ -584,13 +591,15 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>> * allocated. */
>> cluster_offset = be64_to_cpu(l2_table[l2_index]);
>> if (cluster_offset & L2E_OFFSET_MASK) {
>> - qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> + block_cache_put(bs, s->l2_table_cache,
>> + (void **) &l2_table, BLOCK_TABLE_L2);
>> return 0;
>> }
>>
>> cluster_offset = qcow2_alloc_bytes(bs, compressed_size);
>> if (cluster_offset < 0) {
>> - qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> + block_cache_put(bs, s->l2_table_cache,
>> + (void **) &l2_table, BLOCK_TABLE_L2);
>> return 0;
>> }
>>
>> @@ -605,9 +614,10 @@ uint64_t qcow2_alloc_compressed_cluster_offset(BlockDriverState *bs,
>> /* compressed clusters never have the copied flag */
>>
>> BLKDBG_EVENT(bs->file, BLKDBG_L2_UPDATE_COMPRESSED);
>> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> l2_table[l2_index] = cpu_to_be64(cluster_offset);
>> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> + ret = block_cache_put(bs, s->l2_table_cache,
>> + (void **) &l2_table, BLOCK_TABLE_L2);
>> if (ret < 0) {
>> return 0;
>> }
>> @@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>> * handled.
>> */
>> if (cow) {
>> - qcow2_cache_depends_on_flush(s->l2_table_cache);
>> + block_cache_depends_on_flush(s->l2_table_cache);
>> }
>>
>> - if (qcow2_need_accurate_refcounts(s)) {
>> - qcow2_cache_set_dependency(bs, s->l2_table_cache,
>> - s->refcount_block_cache);
>> - }
>> + block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
>> + s->refcount_block_cache, s->cluster_size);
>> ret = get_cluster_table(bs, m->offset, &l2_table, &l2_index);
>> if (ret < 0) {
>> goto err;
>> }
>> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>>
>> for (i = 0; i < m->nb_clusters; i++) {
>> /* if two concurrent writes happen to the same unallocated cluster
>> @@ -687,7 +695,8 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
>> }
>>
>>
>> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> + ret = block_cache_put(bs, s->l2_table_cache,
>> + (void **) &l2_table, BLOCK_TABLE_L2);
>> if (ret < 0) {
>> goto err;
>> }
>> @@ -913,7 +922,8 @@ again:
>> * request to complete. If we still had the reference, we could use up the
>> * whole cache with sleeping requests.
>> */
>> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> + ret = block_cache_put(bs, s->l2_table_cache,
>> + (void **) &l2_table, BLOCK_TABLE_L2);
>> if (ret < 0) {
>> return ret;
>> }
>> @@ -1077,14 +1087,15 @@ static int discard_single_l2(BlockDriverState *bs, uint64_t offset,
>> }
>>
>> /* First remove L2 entries */
>> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> l2_table[l2_index + i] = cpu_to_be64(0);
>>
>> /* Then decrease the refcount */
>> qcow2_free_any_clusters(bs, old_offset, 1);
>> }
>>
>> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> + ret = block_cache_put(bs, s->l2_table_cache,
>> + (void **) &l2_table, BLOCK_TABLE_L2);
>> if (ret < 0) {
>> return ret;
>> }
>> @@ -1154,7 +1165,7 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>> old_offset = be64_to_cpu(l2_table[l2_index + i]);
>>
>> /* Update L2 entries */
>> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> if (old_offset & QCOW_OFLAG_COMPRESSED) {
>> l2_table[l2_index + i] = cpu_to_be64(QCOW_OFLAG_ZERO);
>> qcow2_free_any_clusters(bs, old_offset, 1);
>> @@ -1163,7 +1174,8 @@ static int zero_single_l2(BlockDriverState *bs, uint64_t offset,
>> }
>> }
>>
>> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> + ret = block_cache_put(bs, s->l2_table_cache,
>> + (void **) &l2_table, BLOCK_TABLE_L2);
>> if (ret < 0) {
>> return ret;
>> }
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 5e3f915..728bfc1 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -25,6 +25,7 @@
>> #include "qemu-common.h"
>> #include "block_int.h"
>> #include "block/qcow2.h"
>> +#include "block-cache.h"
>>
>> static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
>> static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>> @@ -71,8 +72,8 @@ static int load_refcount_block(BlockDriverState *bs,
>> int ret;
>>
>> BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_LOAD);
>> - ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> - refcount_block);
>> + ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> + refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>>
>> return ret;
>> }
>> @@ -98,8 +99,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>> if (!refcount_block_offset)
>> return 0;
>>
>> - ret = qcow2_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> - (void**) &refcount_block);
>> + ret = block_cache_get(bs, s->refcount_block_cache, refcount_block_offset,
>> + (void **) &refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>> if (ret < 0) {
>> return ret;
>> }
>> @@ -108,8 +109,8 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>> ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
>> refcount = be16_to_cpu(refcount_block[block_index]);
>>
>> - ret = qcow2_cache_put(bs, s->refcount_block_cache,
>> - (void**) &refcount_block);
>> + ret = block_cache_put(bs, s->refcount_block_cache,
>> + (void **) &refcount_block, BLOCK_TABLE_REF);
>> if (ret < 0) {
>> return ret;
>> }
>> @@ -201,7 +202,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>> *refcount_block = NULL;
>>
>> /* We write to the refcount table, so we might depend on L2 tables */
>> - qcow2_cache_flush(bs, s->l2_table_cache);
>> + block_cache_flush(bs, s->l2_table_cache,
>> + BLOCK_TABLE_L2, s->cluster_size);
>>
>> /* Allocate the refcount block itself and mark it as used */
>> int64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
>> @@ -217,8 +219,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>
>> if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
>> /* Zero the new refcount block before updating it */
>> - ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> - (void**) refcount_block);
>> + ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> + (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>> if (ret < 0) {
>> goto fail_block;
>> }
>> @@ -241,8 +243,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>
>> /* Initialize the new refcount block only after updating its refcount,
>> * update_refcount uses the refcount cache itself */
>> - ret = qcow2_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> - (void**) refcount_block);
>> + ret = block_cache_get_empty(bs, s->refcount_block_cache, new_block,
>> + (void **) refcount_block, BLOCK_TABLE_REF, s->cluster_size);
>> if (ret < 0) {
>> goto fail_block;
>> }
>> @@ -252,8 +254,9 @@ static int alloc_refcount_block(BlockDriverState *bs,
>>
>> /* Now the new refcount block needs to be written to disk */
>> BLKDBG_EVENT(bs->file, BLKDBG_REFBLOCK_ALLOC_WRITE);
>> - qcow2_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
>> - ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> + block_cache_entry_mark_dirty(s->refcount_block_cache, *refcount_block);
>> + ret = block_cache_flush(bs, s->refcount_block_cache,
>> + BLOCK_TABLE_REF, s->cluster_size);
>> if (ret < 0) {
>> goto fail_block;
>> }
>> @@ -273,7 +276,8 @@ static int alloc_refcount_block(BlockDriverState *bs,
>> return 0;
>> }
>>
>> - ret = qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
>> + ret = block_cache_put(bs, s->refcount_block_cache,
>> + (void **) refcount_block, BLOCK_TABLE_REF);
>> if (ret < 0) {
>> goto fail_block;
>> }
>> @@ -406,7 +410,8 @@ fail_table:
>> g_free(new_table);
>> fail_block:
>> if (*refcount_block != NULL) {
>> - qcow2_cache_put(bs, s->refcount_block_cache, (void**) refcount_block);
>> + block_cache_put(bs, s->refcount_block_cache,
>> + (void **) refcount_block, BLOCK_TABLE_REF);
>> }
>> return ret;
>> }
>> @@ -432,8 +437,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>> }
>>
>> if (addend < 0) {
>> - qcow2_cache_set_dependency(bs, s->refcount_block_cache,
>> - s->l2_table_cache);
>> + block_cache_set_dependency(bs, s->refcount_block_cache, BLOCK_TABLE_REF,
>> + s->l2_table_cache, s->cluster_size);
>> }
>>
>> start = offset & ~(s->cluster_size - 1);
>> @@ -449,8 +454,8 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>> /* Load the refcount block and allocate it if needed */
>> if (table_index != old_table_index) {
>> if (refcount_block) {
>> - ret = qcow2_cache_put(bs, s->refcount_block_cache,
>> - (void**) &refcount_block);
>> + ret = block_cache_put(bs, s->refcount_block_cache,
>> + (void **) &refcount_block, BLOCK_TABLE_REF);
>> if (ret < 0) {
>> goto fail;
>> }
>> @@ -463,7 +468,7 @@ static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
>> }
>> old_table_index = table_index;
>>
>> - qcow2_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
>> + block_cache_entry_mark_dirty(s->refcount_block_cache, refcount_block);
>>
>> /* we can update the count and save it */
>> block_index = cluster_index &
>> @@ -486,8 +491,8 @@ fail:
>> /* Write last changed block to disk */
>> if (refcount_block) {
>> int wret;
>> - wret = qcow2_cache_put(bs, s->refcount_block_cache,
>> - (void**) &refcount_block);
>> + wret = block_cache_put(bs, s->refcount_block_cache,
>> + (void **) &refcount_block, BLOCK_TABLE_REF);
>> if (wret < 0) {
>> return ret < 0 ? ret : wret;
>> }
>> @@ -763,8 +768,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>> old_l2_offset = l2_offset;
>> l2_offset &= L1E_OFFSET_MASK;
>>
>> - ret = qcow2_cache_get(bs, s->l2_table_cache, l2_offset,
>> - (void**) &l2_table);
>> + ret = block_cache_get(bs, s->l2_table_cache, l2_offset,
>> + (void **) &l2_table, BLOCK_TABLE_L2, s->cluster_size);
>> if (ret < 0) {
>> goto fail;
>> }
>> @@ -811,16 +816,18 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>> }
>> if (offset != old_offset) {
>> if (addend > 0) {
>> - qcow2_cache_set_dependency(bs, s->l2_table_cache,
>> - s->refcount_block_cache);
>> + block_cache_set_dependency(bs, s->l2_table_cache,
>> + BLOCK_TABLE_L2, s->refcount_block_cache,
>> + s->cluster_size);
>> }
>> l2_table[j] = cpu_to_be64(offset);
>> - qcow2_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> + block_cache_entry_mark_dirty(s->l2_table_cache, l2_table);
>> }
>> }
>> }
>>
>> - ret = qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> + ret = block_cache_put(bs, s->l2_table_cache,
>> + (void **) &l2_table, BLOCK_TABLE_L2);
>> if (ret < 0) {
>> goto fail;
>> }
>> @@ -847,7 +854,8 @@ int qcow2_update_snapshot_refcount(BlockDriverState *bs,
>> ret = 0;
>> fail:
>> if (l2_table) {
>> - qcow2_cache_put(bs, s->l2_table_cache, (void**) &l2_table);
>> + block_cache_put(bs, s->l2_table_cache,
>> + (void **) &l2_table, BLOCK_TABLE_L2);
>> }
>>
>> /* Update L1 only if it isn't deleted anyway (addend = -1) */
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index fd5e214..b89d312 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -30,6 +30,7 @@
>> #include "qemu-error.h"
>> #include "qerror.h"
>> #include "trace.h"
>> +#include "block-cache.h"
>>
>> /*
>> Differences with QCOW:
>> @@ -415,8 +416,9 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>> }
>>
>> /* alloc L2 table/refcount block cache */
>> - s->l2_table_cache = qcow2_cache_create(bs, L2_CACHE_SIZE);
>> - s->refcount_block_cache = qcow2_cache_create(bs, REFCOUNT_CACHE_SIZE);
>> + s->l2_table_cache = block_cache_create(bs, L2_CACHE_SIZE, s->cluster_size);
>> + s->refcount_block_cache =
>> + block_cache_create(bs, REFCOUNT_CACHE_SIZE, s->cluster_size);
>>
>> s->cluster_cache = g_malloc(s->cluster_size);
>> /* one more sector for decompressed data alignment */
>> @@ -500,7 +502,7 @@ static int qcow2_open(BlockDriverState *bs, int flags)
>> qcow2_refcount_close(bs);
>> g_free(s->l1_table);
>> if (s->l2_table_cache) {
>> - qcow2_cache_destroy(bs, s->l2_table_cache);
>> + block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
>> }
>> g_free(s->cluster_cache);
>> qemu_vfree(s->cluster_data);
>> @@ -860,13 +862,13 @@ static void qcow2_close(BlockDriverState *bs)
>> BDRVQcowState *s = bs->opaque;
>> g_free(s->l1_table);
>>
>> - qcow2_cache_flush(bs, s->l2_table_cache);
>> - qcow2_cache_flush(bs, s->refcount_block_cache);
>> -
>> + block_cache_flush(bs, s->l2_table_cache,
>> + BLOCK_TABLE_L2, s->cluster_size);
>> + block_cache_flush(bs, s->refcount_block_cache,
>> + BLOCK_TABLE_REF, s->cluster_size);
>> qcow2_mark_clean(bs);
>> -
>> - qcow2_cache_destroy(bs, s->l2_table_cache);
>> - qcow2_cache_destroy(bs, s->refcount_block_cache);
>> + block_cache_destroy(bs, s->l2_table_cache, BLOCK_TABLE_L2);
>> + block_cache_destroy(bs, s->refcount_block_cache, BLOCK_TABLE_REF);
>>
>> g_free(s->unknown_header_fields);
>> cleanup_unknown_header_ext(bs);
>> @@ -1339,8 +1341,6 @@ static int qcow2_create(const char *filename, QEMUOptionParameter *options)
>> options->value.s);
>> return -EINVAL;
>> }
>> - } else if (!strcmp(options->name, BLOCK_OPT_LAZY_REFCOUNTS)) {
>> - flags |= options->value.n ? BLOCK_FLAG_LAZY_REFCOUNTS : 0;
>> }
>> options++;
>> }
>> @@ -1537,18 +1537,18 @@ static coroutine_fn int qcow2_co_flush_to_os(BlockDriverState *bs)
>> int ret;
>>
>> qemu_co_mutex_lock(&s->lock);
>> - ret = qcow2_cache_flush(bs, s->l2_table_cache);
>> + ret = block_cache_flush(bs, s->l2_table_cache,
>> + BLOCK_TABLE_L2, s->cluster_size);
>> if (ret < 0) {
>> qemu_co_mutex_unlock(&s->lock);
>> return ret;
>> }
>>
>> - if (qcow2_need_accurate_refcounts(s)) {
>> - ret = qcow2_cache_flush(bs, s->refcount_block_cache);
>> - if (ret < 0) {
>> - qemu_co_mutex_unlock(&s->lock);
>> - return ret;
>> - }
>> + ret = block_cache_flush(bs, s->refcount_block_cache,
>> + BLOCK_TABLE_REF, s->cluster_size);
>> + if (ret < 0) {
>> + qemu_co_mutex_unlock(&s->lock);
>> + return ret;
>> }
>> qemu_co_mutex_unlock(&s->lock);
>>
>> diff --git a/block/qcow2.h b/block/qcow2.h
>> index b4eb654..cb6fd7a 100644
>> --- a/block/qcow2.h
>> +++ b/block/qcow2.h
>> @@ -27,6 +27,7 @@
>>
>> #include "aes.h"
>> #include "qemu-coroutine.h"
>> +#include "block-cache.h"
>>
>> //#define DEBUG_ALLOC
>> //#define DEBUG_ALLOC2
>> @@ -94,8 +95,6 @@ typedef struct QCowSnapshot {
>> uint64_t vm_clock_nsec;
>> } QCowSnapshot;
>>
>> -struct Qcow2Cache;
>> -typedef struct Qcow2Cache Qcow2Cache;
>>
>> typedef struct Qcow2UnknownHeaderExtension {
>> uint32_t magic;
>> @@ -146,8 +145,8 @@ typedef struct BDRVQcowState {
>> uint64_t l1_table_offset;
>> uint64_t *l1_table;
>>
>> - Qcow2Cache* l2_table_cache;
>> - Qcow2Cache* refcount_block_cache;
>> + BlockCache *l2_table_cache;
>> + BlockCache *refcount_block_cache;
>>
>> uint8_t *cluster_cache;
>> uint8_t *cluster_data;
>> @@ -316,21 +315,4 @@ int qcow2_snapshot_load_tmp(BlockDriverState *bs, const char *snapshot_name);
>>
>> void qcow2_free_snapshots(BlockDriverState *bs);
>> int qcow2_read_snapshots(BlockDriverState *bs);
>> -
>> -/* qcow2-cache.c functions */
>> -Qcow2Cache *qcow2_cache_create(BlockDriverState *bs, int num_tables);
>> -int qcow2_cache_destroy(BlockDriverState* bs, Qcow2Cache *c);
>> -
>> -void qcow2_cache_entry_mark_dirty(Qcow2Cache *c, void *table);
>> -int qcow2_cache_flush(BlockDriverState *bs, Qcow2Cache *c);
>> -int qcow2_cache_set_dependency(BlockDriverState *bs, Qcow2Cache *c,
>> - Qcow2Cache *dependency);
>> -void qcow2_cache_depends_on_flush(Qcow2Cache *c);
>> -
>> -int qcow2_cache_get(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> - void **table);
>> -int qcow2_cache_get_empty(BlockDriverState *bs, Qcow2Cache *c, uint64_t offset,
>> - void **table);
>> -int qcow2_cache_put(BlockDriverState *bs, Qcow2Cache *c, void **table);
>> -
>> #endif
>> diff --git a/trace-events b/trace-events
>> index 6b12f83..52b6438 100644
>> --- a/trace-events
>> +++ b/trace-events
>> @@ -439,12 +439,13 @@ qcow2_l2_allocate_write_l2(void *bs, int l1_index) "bs %p l1_index %d"
>> qcow2_l2_allocate_write_l1(void *bs, int l1_index) "bs %p l1_index %d"
>> qcow2_l2_allocate_done(void *bs, int l1_index, int ret) "bs %p l1_index %d ret %d"
>>
>> -qcow2_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
>> -qcow2_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> -qcow2_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> -qcow2_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> -qcow2_cache_flush(void *co, int c) "co %p is_l2_cache %d"
>> -qcow2_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +# block/block-cache.c
>> +block_cache_get(void *co, int c, uint64_t offset, bool read_from_disk) "co %p is_l2_cache %d offset %" PRIx64 " read_from_disk %d"
>> +block_cache_get_replace_entry(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +block_cache_get_read(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +block_cache_get_done(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>> +block_cache_flush(void *co, int c) "co %p is_l2_cache %d"
>> +block_cache_entry_flush(void *co, int c, int i) "co %p is_l2_cache %d index %d"
>>
>> # block/qed-l2-cache.c
>> qed_alloc_l2_cache_entry(void *l2_cache, void *entry) "l2_cache %p entry %p"
>> --
>> 1.7.1
>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
2012-09-06 20:19 ` Michael Roth
@ 2012-09-10 2:25 ` Dong Xu Wang
2012-09-11 9:44 ` Kevin Wolf
0 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-10 2:25 UTC (permalink / raw)
To: Michael Roth; +Cc: kwolf, qemu-devel
On Fri, Sep 7, 2012 at 4:19 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
> On Fri, Aug 10, 2012 at 11:39:44PM +0800, Dong Xu Wang wrote:
>> add-cow file format core code. It use block-cache.c as cache code.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>> block/Makefile.objs | 1 +
>> block/add-cow.c | 613 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> block/add-cow.h | 85 +++++++
>> block_int.h | 2 +
>> 4 files changed, 701 insertions(+), 0 deletions(-)
>> create mode 100644 block/add-cow.c
>> create mode 100644 block/add-cow.h
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 23bdfc8..7ed5051 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>> block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>> block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>> block-obj-y += qed-check.o
>> +block-obj-y += add-cow.o
>> block-obj-y += block-cache.o
>> block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>> block-obj-y += stream.o
>> diff --git a/block/add-cow.c b/block/add-cow.c
>> new file mode 100644
>> index 0000000..d4711d5
>> --- /dev/null
>> +++ b/block/add-cow.c
>> @@ -0,0 +1,613 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "block_int.h"
>> +#include "module.h"
>> +#include "add-cow.h"
>> +
>> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
>> +{
>> + cpu->magic = le64_to_cpu(le->magic);
>> + cpu->version = le32_to_cpu(le->version);
>> +
>> + cpu->backing_filename_offset = le32_to_cpu(le->backing_filename_offset);
>> + cpu->backing_filename_size = le32_to_cpu(le->backing_filename_size);
>> +
>> + cpu->image_filename_offset = le32_to_cpu(le->image_filename_offset);
>> + cpu->image_filename_size = le32_to_cpu(le->image_filename_size);
>> +
>> + cpu->features = le64_to_cpu(le->features);
>> + cpu->optional_features = le64_to_cpu(le->optional_features);
>> + cpu->header_pages_size = le32_to_cpu(le->header_pages_size);
>> +}
>> +
>> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
>> +{
>> + le->magic = cpu_to_le64(cpu->magic);
>> + le->version = cpu_to_le32(cpu->version);
>> +
>> + le->backing_filename_offset = cpu_to_le32(cpu->backing_filename_offset);
>> + le->backing_filename_size = cpu_to_le32(cpu->backing_filename_size);
>> +
>> + le->image_filename_offset = cpu_to_le32(cpu->image_filename_offset);
>> + le->image_filename_size = cpu_to_le32(cpu->image_filename_size);
>> +
>> + le->features = cpu_to_le64(cpu->features);
>> + le->optional_features = cpu_to_le64(cpu->optional_features);
>> + le->header_pages_size = cpu_to_le32(cpu->header_pages_size);
>> +}
>> +
>> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
>> +{
>> + const AddCowHeader *header = (const AddCowHeader *)buf;
>> +
>> + if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
>> + le32_to_cpu(header->version) == ADD_COW_VERSION) {
>> + return 100;
>> + } else {
>> + return 0;
>> + }
>> +}
>> +
>> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
>> +{
>> + AddCowHeader header = {
>> + .magic = ADD_COW_MAGIC,
>> + .version = ADD_COW_VERSION,
>> + .features = 0,
>> + .optional_features = 0,
>> + .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
>> + };
>> + AddCowHeader le_header;
>> + int64_t image_len = 0;
>> + const char *backing_filename = NULL;
>> + const char *backing_fmt = NULL;
>> + const char *image_filename = NULL;
>> + const char *image_format = NULL;
>> + BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
>> + BlockDriver *drv = bdrv_find_format("add-cow");
>> + BDRVAddCowState s;
>> + int ret;
>> +
>> + while (options && options->name) {
>> + if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
>> + image_len = options->value.n;
>> + } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
>> + backing_filename = options->value.s;
>> + } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
>> + backing_fmt = options->value.s;
>> + } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
>> + image_filename = options->value.s;
>> + } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
>> + image_format = options->value.s;
>> + }
>> + options++;
>> + }
>> +
>> + if (backing_filename) {
>> + header.backing_filename_offset = sizeof(header)
>> + + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
>> + header.backing_filename_size = strlen(backing_filename);
>> +
>> + if (!backing_fmt) {
>> + backing_bs = bdrv_new("image");
>> + ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
>> + | BDRV_O_CACHE_WB, NULL);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + backing_fmt = bdrv_get_format_name(backing_bs);
>> + bdrv_delete(backing_bs);
>> + }
>> + } else {
>> + header.features |= ADD_COW_F_All_ALLOCATED;
>> + }
>> +
>> + if (image_filename) {
>> + header.image_filename_offset =
>> + sizeof(header) + sizeof(s.backing_file_format)
>> + + sizeof(s.image_file_format) + header.backing_filename_size;
>> + header.image_filename_size = strlen(image_filename);
>> + } else {
>> + error_report("Error: image_file should be given.");
>> + return -EINVAL;
>> + }
>> +
>> + if (backing_filename && !strcmp(backing_filename, image_filename)) {
>> + error_report("Error: Trying to create an image with the "
>> + "same backing file name as the image file name");
>> + return -EINVAL;
>> + }
>> +
>> + if (!strcmp(filename, image_filename)) {
>> + error_report("Error: Trying to create an image with the "
>> + "same filename as the image file name");
>> + return -EINVAL;
>> + }
>> +
>> + if (header.image_filename_offset + header.image_filename_size
>> + > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
>> + error_report("image_file name or backing_file name too long.");
>> + return -ENOSPC;
>> + }
>> +
>> + ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + bdrv_delete(image_bs);
>> +
>> + ret = bdrv_create_file(filename, NULL);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> +
>> + ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + add_cow_header_cpu_to_le(&header, &le_header);
>> + ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> +
>> + ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
>> + backing_fmt ? strlen(backing_fmt) : 0);
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> +
>> + ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
>> + image_format ? image_format : "raw",
>> + image_format ? strlen(image_format) : sizeof("raw"));
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> +
>> + if (backing_filename) {
>> + ret = bdrv_pwrite(bs, header.backing_filename_offset,
>> + backing_filename, header.backing_filename_size);
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> + }
>> +
>> + ret = bdrv_pwrite(bs, header.image_filename_offset,
>> + image_filename, header.image_filename_size);
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> +
>> + ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> +
>> + ret = bdrv_truncate(bs, image_len);
>> + bdrv_delete(bs);
>> + return ret;
>> +}
>> +
>> +static int add_cow_open(BlockDriverState *bs, int flags)
>> +{
>> + char image_filename[ADD_COW_FILE_LEN];
>> + char tmp_name[ADD_COW_FILE_LEN];
>> + BlockDriver *image_drv = NULL;
>> + int ret;
>> + int sector_per_byte;
>> + BDRVAddCowState *s = bs->opaque;
>> + AddCowHeader le_header;
>> +
>> + ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
>> + if (ret != sizeof(s->header)) {
>> + goto fail;
>> + }
>> +
>> + add_cow_header_le_to_cpu(&le_header, &s->header);
>> +
>> + if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
>> + ret = -EINVAL;
>> + goto fail;
>> + }
>> +
>> + if (s->header.version != ADD_COW_VERSION) {
>> + char version[64];
>> + snprintf(version, sizeof(version), "ADD-COW version %d",
>> + s->header.version);
>> + qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> + bs->device_name, "add-cow", version);
>> + ret = -ENOTSUP;
>> + goto fail;
>> + }
>> +
>> + if (s->header.features & ~ADD_COW_FEATURE_MASK) {
>> + char buf[64];
>> + snprintf(buf, sizeof(buf), "%" PRIx64,
>> + s->header.features & ~ADD_COW_FEATURE_MASK);
>> + qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> + bs->device_name, "add-cow", buf);
>> + return -ENOTSUP;
>> + }
>> +
>> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> + ret = bdrv_read_string(bs->file, sizeof(s->header),
>> + sizeof(s->backing_file_format) - 1, s->backing_file_format,
>> + sizeof(s->backing_file_format));
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + }
>> +
>> + ret = bdrv_read_string(bs->file,
>> + sizeof(s->header) + sizeof(s->image_file_format),
>> + sizeof(s->image_file_format) - 1, s->image_file_format,
>> + sizeof(s->image_file_format));
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> +
>> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> + ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>> + s->header.backing_filename_size, bs->backing_file,
>> + sizeof(bs->backing_file));
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + }
>> +
>> + ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
>> + s->header.image_filename_size, tmp_name,
>> + sizeof(tmp_name));
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> +
>> + s->image_hd = bdrv_new("");
>> + if (path_has_protocol(image_filename)) {
>> + pstrcpy(image_filename, sizeof(image_filename), tmp_name);
>> + } else {
>> + path_combine(image_filename, sizeof(image_filename),
>> + bs->filename, tmp_name);
>> + }
>> +
>> + ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
>> + if (ret < 0) {
>> + bdrv_delete(s->image_hd);
>> + goto fail;
>> + }
>> +
>> + bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
>> + s->cluster_size = ADD_COW_CLUSTER_SIZE;
>> + sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> + s->bitmap_size =
>> + (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
>> + s->bitmap_cache =
>> + block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
>> +
>> + qemu_co_mutex_init(&s->lock);
>> + return 0;
>> +fail:
>> + if (s->bitmap_cache) {
>> + block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> + }
>> + return ret;
>> +}
>> +
>> +static void add_cow_close(BlockDriverState *bs)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> + bdrv_delete(s->image_hd);
>> +}
>> +
>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + BlockCache *c = s->bitmap_cache;
>> + int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> + uint8_t *table = NULL;
>> + uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> + + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
>> + int ret = block_cache_get(bs, s->bitmap_cache, offset,
>> + (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>> +
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
>> + & (1 << (cluster_num % 8));
>> +}
>> +
>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>> + int64_t sector_num, int nb_sectors, int *num_same)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + int changed;
>> +
>> + if (nb_sectors == 0) {
>> + *num_same = 0;
>> + return 0;
>> + }
>> +
>> + if (s->header.features & ADD_COW_F_All_ALLOCATED) {
>> + *num_same = nb_sectors - 1;
>> + return 1;
>> + }
>> + changed = is_allocated(bs, sector_num);
>> +
>> + for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>> + if (is_allocated(bs, sector_num + *num_same) != changed) {
>> + break;
>> + }
>> + }
>> + return changed;
>> +}
>> +
>> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
>> + int64_t sector_num, int nb_sectors)
>> +{
>> + int n1;
>> + if ((sector_num + nb_sectors) <= bs->total_sectors) {
>> + return nb_sectors;
>> + }
>> + if (sector_num >= bs->total_sectors) {
>> + n1 = 0;
>> + } else {
>> + n1 = bs->total_sectors - sector_num;
>> + }
>> +
>> + qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
>> + 0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
>> +
>> + return n1;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
>> + int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + int cur_nr_sectors;
>> + uint64_t bytes_done = 0;
>> + QEMUIOVector hd_qiov;
>> + int n, n1, ret = 0;
>> +
>> + qemu_iovec_init(&hd_qiov, qiov->niov);
>> + qemu_co_mutex_lock(&s->lock);
>> + while (remaining_sectors != 0) {
>> + cur_nr_sectors = remaining_sectors;
>> + if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
>> + cur_nr_sectors = n;
>> + qemu_iovec_reset(&hd_qiov);
>> + qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> + cur_nr_sectors * BDRV_SECTOR_SIZE);
>> + qemu_co_mutex_unlock(&s->lock);
>> + ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
>> + qemu_co_mutex_lock(&s->lock);
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + } else {
>> + cur_nr_sectors = n;
>> + if (bs->backing_hd) {
>> + qemu_iovec_reset(&hd_qiov);
>> + qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> + cur_nr_sectors * BDRV_SECTOR_SIZE);
>> + n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
>> + sector_num, cur_nr_sectors);
>> + if (n1 > 0) {
>> + qemu_co_mutex_unlock(&s->lock);
>> + ret = bdrv_co_readv(bs->backing_hd, sector_num,
>> + n, &hd_qiov);
>> + qemu_co_mutex_lock(&s->lock);
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + }
>> + } else {
>> + qemu_iovec_memset(&hd_qiov, 0, 0,
>> + BDRV_SECTOR_SIZE * cur_nr_sectors);
>> + }
>> + }
>> + remaining_sectors -= cur_nr_sectors;
>> + sector_num += cur_nr_sectors;
>> + bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
>> + }
>> +fail:
>> + qemu_co_mutex_unlock(&s->lock);
>> + qemu_iovec_destroy(&hd_qiov);
>> + return ret;
>> +}
>> +
>> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
>> + int n_start, int n_end)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + QEMUIOVector qiov;
>> + struct iovec iov;
>> + int n, ret;
>> +
>> + n = n_end - n_start;
>> + if (n <= 0) {
>> + return 0;
>> + }
>> +
>> + iov.iov_len = n * BDRV_SECTOR_SIZE;
>> + iov.iov_base = qemu_blockalign(bs, iov.iov_len);
>> +
>> + qemu_iovec_init_external(&qiov, &iov, 1);
>> +
>> + ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
>> + if (ret < 0) {
>> + goto out;
>> + }
>> + ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
>> + if (ret < 0) {
>> + goto out;
>> + }
>> +
>> + ret = 0;
>> +out:
>> + qemu_vfree(iov.iov_base);
>> + return ret;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
>> + int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + BlockCache *c = s->bitmap_cache;
>> + int ret = 0, i;
>> + QEMUIOVector hd_qiov;
>> + uint8_t *table;
>> + uint64_t offset;
>> +
>> + qemu_co_mutex_lock(&s->lock);
>> + qemu_iovec_init(&hd_qiov, qiov->niov);
>> + ret = bdrv_co_writev(s->image_hd,
>> + sector_num,
>> + remaining_sectors, qiov);
>
> alignment ^
>
> or even at ^ if you prefer and have done in some places, just need to be
> consistent about it for better readability.
>
>> +
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> + /* Copy content of unmodified sectors */
>> + if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
>
> Why do we avoid a COW when writing to the first sector of a cluster?
Because if it is the first sector, we need not use copy_sector, we
write it directly would be enough, it starts at the begening of one
cluster.
>
>> + ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
>> + sector_num);
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + }
>> +
>> + if (!is_cluster_tail(sector_num + remaining_sectors - 1)
>> + && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
>> + ret = copy_sectors(bs, sector_num + remaining_sectors,
>> + ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + }
>> +
>> + for (i = sector_num / SECTORS_PER_CLUSTER;
>> + i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
>> + i++) {
>> + offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> + + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
>> + ret = block_cache_get(bs, s->bitmap_cache, offset,
>> + (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + if ((table[i / 8] & (1 << (i % 8))) == 0) {
>> + table[i / 8] |= (1 << (i % 8));
>> + block_cache_entry_mark_dirty(s->bitmap_cache, table);
>> + }
>> + }
>> + }
>> + ret = 0;
>> +fail:
>> + qemu_co_mutex_unlock(&s->lock);
>> + qemu_iovec_destroy(&hd_qiov);
>> + return ret;
>> +}
>> +
>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + int sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> + int ret;
>> + uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
>> + int64_t bitmap_size =
>> + (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>> + bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
>> + & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
>> +
>> + ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + return 0;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + int ret;
>> +
>> + qemu_co_mutex_lock(&s->lock);
>> + ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
>> + ADD_COW_CACHE_ENTRY_SIZE);
>> + qemu_co_mutex_unlock(&s->lock);
>> + return ret;
>> +}
>> +
>> +static QEMUOptionParameter add_cow_create_options[] = {
>> + {
>> + .name = BLOCK_OPT_SIZE,
>> + .type = OPT_SIZE,
>> + .help = "Virtual disk size"
>> + },
>> + {
>> + .name = BLOCK_OPT_BACKING_FILE,
>> + .type = OPT_STRING,
>> + .help = "File name of a base image"
>> + },
>> + {
>> + .name = BLOCK_OPT_BACKING_FMT,
>> + .type = OPT_STRING,
>> + .help = "Image format of the base image"
>> + },
>> + {
>> + .name = BLOCK_OPT_IMAGE_FILE,
>> + .type = OPT_STRING,
>> + .help = "File name of a image file"
>> + },
>> + {
>> + .name = BLOCK_OPT_IMAGE_FORMAT,
>> + .type = OPT_STRING,
>> + .help = "Image format of the image file"
>> + },
>> + { NULL }
>> +};
>> +
>> +static BlockDriver bdrv_add_cow = {
>> + .format_name = "add-cow",
>> + .instance_size = sizeof(BDRVAddCowState),
>> + .bdrv_probe = add_cow_probe,
>> + .bdrv_open = add_cow_open,
>> + .bdrv_close = add_cow_close,
>> + .bdrv_create = add_cow_create,
>> + .bdrv_co_readv = add_cow_co_readv,
>> + .bdrv_co_writev = add_cow_co_writev,
>> + .bdrv_truncate = bdrv_add_cow_truncate,
>> + .bdrv_co_is_allocated = add_cow_is_allocated,
>> +
>> + .create_options = add_cow_create_options,
>> + .bdrv_co_flush_to_os = add_cow_co_flush,
>> +};
>> +
>> +static void bdrv_add_cow_init(void)
>> +{
>> + bdrv_register(&bdrv_add_cow);
>> +}
>> +
>> +block_init(bdrv_add_cow_init);
>> diff --git a/block/add-cow.h b/block/add-cow.h
>> new file mode 100644
>> index 0000000..f058376
>> --- /dev/null
>> +++ b/block/add-cow.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef BLOCK_ADD_COW_H
>> +#define BLOCK_ADD_COW_H
>> +#include "block-cache.h"
>> +
>> +enum {
>> + ADD_COW_F_All_ALLOCATED = 0X01,
>
> Please use "ADD_COW_F_ALL_ALLOCATED" (all caps)
Okay.
>
> was searching your patch for how this was used and was scratching my
> head when I wasn't seeing any matches :)
It wil be used such as:
qemu-img create -f add-cow -o image_file=t.raw t.add-cow
while we need not read from backing_file any more.
>
>> + ADD_COW_FEATURE_MASK = ADD_COW_F_All_ALLOCATED,
>> +
>> + ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
>> + ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
>> + ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
>> + ((uint64_t)'W' << 8) | 0xFF),
>> + ADD_COW_VERSION = 1,
>> + ADD_COW_FILE_LEN = 1024,
>> + ADD_COW_CACHE_SIZE = 16,
>> + ADD_COW_CACHE_ENTRY_SIZE = 65536,
>> + ADD_COW_CLUSTER_SIZE = 65536,
>> + SECTORS_PER_CLUSTER = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
>> + ADD_COW_PAGE_SIZE = 4096,
>> + ADD_COW_DEFAULT_PAGE_SIZE = 1,
>> +};
>> +
>> +typedef struct AddCowHeader {
>> + uint64_t magic;
>> + uint32_t version;
>> +
>> + uint32_t backing_filename_offset;
>> + uint32_t backing_filename_size;
>> +
>> + uint32_t image_filename_offset;
>> + uint32_t image_filename_size;
>> +
>> + uint64_t features;
>> + uint64_t optional_features;
>> + uint32_t header_pages_size;
>> +} QEMU_PACKED AddCowHeader;
>
> You should avoid using packed structures for image format headers.
> Instead, I would either:
>
> a) re-order the fields so that 32/64-bit fields, respectively, fall on
> 32/64-bit boundaries (in your case, for instance, moving header_pages_size
> above features) like qed/qcow2 do, or
>
> b) read/write the fields individually rather than reading/writing directly
> into/from the header struct.
>
> The safest route is b). Adds a few lines of code, but you won't have to
> re-work things (or worry about introducing bugs) later if you were to add,
> say, a 32-bit value, and then a 64-bit value later.
While, Kevin's suggestion is using PACKED, so ..
>
>> +
>> +typedef struct BDRVAddCowState {
>> + BlockDriverState *image_hd;
>> + CoMutex lock;
>> + int cluster_size;
>> + BlockCache *bitmap_cache;
>> + uint64_t bitmap_size;
>> + AddCowHeader header;
>> + char backing_file_format[16];
>> + char image_file_format[16];
>> +} BDRVAddCowState;
>> +
>> +/* Convert sector_num to offset in bitmap */
>> +static inline int64_t offset_in_bitmap(int64_t sector_num)
>> +{
>> + int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> + return cluster_num / 8;
>> +}
>> +
>> +static inline bool is_cluster_head(int64_t sector_num)
>> +{
>> + return sector_num % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +static inline bool is_cluster_tail(int64_t sector_num)
>> +{
>> + return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
>> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
>> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
>> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
>> + void **table);
>> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
>> +#endif
>> diff --git a/block_int.h b/block_int.h
>> index 6c1d9ca..67954ec 100644
>> --- a/block_int.h
>> +++ b/block_int.h
>> @@ -53,6 +53,8 @@
>> #define BLOCK_OPT_SUBFMT "subformat"
>> #define BLOCK_OPT_COMPAT_LEVEL "compat"
>> #define BLOCK_OPT_LAZY_REFCOUNTS "lazy_refcounts"
>> +#define BLOCK_OPT_IMAGE_FILE "image_file"
>> +#define BLOCK_OPT_IMAGE_FORMAT "image_format"
>>
>> typedef struct BdrvTrackedRequest BdrvTrackedRequest;
>>
>> --
>> 1.7.1
>>
>>
>
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
2012-09-06 17:27 ` Michael Roth
@ 2012-09-10 15:23 ` Kevin Wolf
2012-09-11 2:12 ` Dong Xu Wang
1 sibling, 1 reply; 25+ messages in thread
From: Kevin Wolf @ 2012-09-10 15:23 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: qemu-devel
Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> Document for add-cow format, the usage and spec of add-cow are introduced.
>
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> docs/specs/add-cow.txt | 123 ++++++++++++++++++++++++++++++++++++++++++++++++
> 1 files changed, 123 insertions(+), 0 deletions(-)
> create mode 100644 docs/specs/add-cow.txt
>
> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
> new file mode 100644
> index 0000000..d5a7a68
> --- /dev/null
> +++ b/docs/specs/add-cow.txt
> @@ -0,0 +1,123 @@
> +== General ==
> +
> +The raw file format does not support backing files or copy on write feature.
> +The add-cow image format makes it possible to use backing files with raw
> +image by keeping a separate .add-cow metadata file. Once all sectors
> +have been written into the raw image it is safe to discard the .add-cow
> +and backing files, then we can use the raw image directly.
> +
> +An example usage of add-cow would look like::
> +(ubuntu.img is a disk image which has been installed OS.)
> + 1) Create a raw image with the same size of ubuntu.img
> + qemu-img create -f raw test.raw 8G
> + 2) Create an add-cow image which will store dirty bitmap
> + qemu-img create -f add-cow test.add-cow \
> + -o backing_file=ubuntu.img,image_file=test.raw
> + 3) Run qemu with add-cow image
> + qemu -drive if=virtio,file=test.add-cow
> +
> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
> +will be calculated from the size of test.raw.
> +
> +=Specification=
> +
> +The file format looks like this:
> +
> + +---------------+-------------+-----------------+
> + | Header | Reserved | COW bitmap |
> + +---------------+-------------+-----------------+
> +
> +All numbers in add-cow are stored in Little Endian byte order.
> +
> +== Header ==
> +
> +The Header is included in the first bytes:
> +(#define HEADER_SIZE (4096 * header_pages_size))
> + Byte 0 - 7: magic
> + add-cow magic string ("ADD_COW\xff").
> +
> + 8 - 11: version
> + Version number (only valid value is 1 now).
> +
> + 12 - 15: backing file name offset
> + Offset in the add-cow file at which the backing file
> + name is stored (NB: The string is not nul-terminated).
> + If backing file name does NOT exist, this field will be
> + 0. Must be between 80 and [HEADER_SIZE - 2](a file name
> + must be at least 1 byte).
> +
> + 16 - 19: backing file name size
> + Length of the backing file name in bytes. It will be 0
> + if the backing file name offset is 0. If backing file
> + name offset is non-zero, then it must be non-zero. Must
> + be less than [HEADER_SIZE - 80] to fit in the reserved
> + part of the header.
> +
> + 20 - 23: image file name offset
> + Offset in the add-cow file at which the image file name
> + is stored (NB: The string is not null terminated). It
> + must be between 80 and [HEADER_SIZE - 2].
> +
> + 24 - 27: image file name size
> + Length of the image file name in bytes.
> + Must be less than [HEADER_SIZE - 80] to fit in the reserved
> + part of the header.
> +
> + 28 - 35: features
> + Currently only 1 feature bit is used:
What happens when opening a file with an unknown bit set? How must
unknown bits be initialised?
> + Feature bits:
> + * ADD_COW_F_All_ALLOCATED = 0x01.
What does this flag mean, and is it required to be set on that
condition? Also, please use ALL_CAPS.
> +
> + 36 - 43: optional features
> + Not used now. Reserved for future use. It must be set to 0.
And must be ignored when reading.
> +
> + 44 - 47: header pages size
> + The header field is variable-sized. This field indicates
> + how many pages(4k) will be used to store add-cow header.
> + In add-cow v1, it is fixed to 1, so the header size will
> + be 4k * 1 = 4096 bytes.
Why arbitrarily defined "pages" instead of bytes or at least clusters?
> +
> + 48 - 63: backing file format
> + format of backing file. It will be filled with 0 if
> + backing file name offset is 0. If backing file name
> + offset is non-zero, it must be non-zero. It is coded
> + in free-form ASCII, and is not NUL-terminated.
Zero padded on the right, I guess?
Also defining that a string must be "non-zero" looks odd, should
probably be "non-empty".
> +
> + 64 - 79: image file format
> + format of image file. It must be non-zero. It is coded
> + in free-form ASCII, and is not NUL-terminated.
Same here.
> +
> + 80 - [HEADER_SIZE - 1]:
> + It is used to make sure COW bitmap field starts at the
> + HEADER_SIZE byte, backing file name and image file name
> + will be stored here. The bytes that is not pointing to
> + backing file and image file names will bet set to 0.
"will be set to 0" describes the behaviour of qemu. A spec should
describe the file format, not a specific implementation. Make it "must"
or "should".
> +
> +== COW bitmap ==
> +
> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
> +backing file and image file. The bitmap will track whether the sector in
> +backing file is dirty or not.
> +
> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
> +sectors, then each bit indicates 512 * 128 = 64k bytes.
Should we make the cluster size configurable?
> the size of bitmap is
> +calculated according to virtual size of image file, and it also should be multipe
Typo: multiple
Sure you mean "should", or should it be "must"?
> +of 65536, the bits not used will be set to 0. Within each byte, the least
> +significant bit covers the first cluster. Bit orders in one byte look like:
> + +----+----+----+----+----+----+----+----+
> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
> + +----+----+----+----+----+----+----+----+
> +
> +If the bit is 0, indicates the sector has not been allocated in image file, data
> +should be loaded from backing file while reading; if the bit is 1, indicates the
> +related sector has been dirty, should be loaded from image file while reading.
> +Writing to a sector causes the corresponding bit to be set to 1.
> +
> +If raw image is not an even multiple of cluster bytes, bits that correspond to
> +bytes beyond the raw file size in add-cow will be 0.
"must be written as 0 and must be ignored when reading" or something
like that.
> +Image file name and backing file name must NOT be the same, we prevent this
> +while creating add-cow files.
What we do is irrelevant for a spec.
> +Image file and backing file are interpreted relative to the qcow2 file, not
> +to the current working directory of the process that opened the qcow2 file.
Kevin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 1/6] docs: document for add-cow file format
2012-09-10 15:23 ` Kevin Wolf
@ 2012-09-11 2:12 ` Dong Xu Wang
0 siblings, 0 replies; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-11 2:12 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
On Mon, Sep 10, 2012 at 11:23 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 10.08.2012 17:39, schrieb Dong Xu Wang:
>> Document for add-cow format, the usage and spec of add-cow are introduced.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>> docs/specs/add-cow.txt | 123 ++++++++++++++++++++++++++++++++++++++++++++++++
>> 1 files changed, 123 insertions(+), 0 deletions(-)
>> create mode 100644 docs/specs/add-cow.txt
>>
>> diff --git a/docs/specs/add-cow.txt b/docs/specs/add-cow.txt
>> new file mode 100644
>> index 0000000..d5a7a68
>> --- /dev/null
>> +++ b/docs/specs/add-cow.txt
>> @@ -0,0 +1,123 @@
>> +== General ==
>> +
>> +The raw file format does not support backing files or copy on write feature.
>> +The add-cow image format makes it possible to use backing files with raw
>> +image by keeping a separate .add-cow metadata file. Once all sectors
>> +have been written into the raw image it is safe to discard the .add-cow
>> +and backing files, then we can use the raw image directly.
>> +
>> +An example usage of add-cow would look like::
>> +(ubuntu.img is a disk image which has been installed OS.)
>> + 1) Create a raw image with the same size of ubuntu.img
>> + qemu-img create -f raw test.raw 8G
>> + 2) Create an add-cow image which will store dirty bitmap
>> + qemu-img create -f add-cow test.add-cow \
>> + -o backing_file=ubuntu.img,image_file=test.raw
>> + 3) Run qemu with add-cow image
>> + qemu -drive if=virtio,file=test.add-cow
>> +
>> +test.raw may be larger than ubuntu.img, in that case, the size of test.add-cow
>> +will be calculated from the size of test.raw.
>> +
>> +=Specification=
>> +
>> +The file format looks like this:
>> +
>> + +---------------+-------------+-----------------+
>> + | Header | Reserved | COW bitmap |
>> + +---------------+-------------+-----------------+
>> +
>> +All numbers in add-cow are stored in Little Endian byte order.
>> +
>> +== Header ==
>> +
>> +The Header is included in the first bytes:
>> +(#define HEADER_SIZE (4096 * header_pages_size))
>> + Byte 0 - 7: magic
>> + add-cow magic string ("ADD_COW\xff").
>> +
>> + 8 - 11: version
>> + Version number (only valid value is 1 now).
>> +
>> + 12 - 15: backing file name offset
>> + Offset in the add-cow file at which the backing file
>> + name is stored (NB: The string is not nul-terminated).
>> + If backing file name does NOT exist, this field will be
>> + 0. Must be between 80 and [HEADER_SIZE - 2](a file name
>> + must be at least 1 byte).
>> +
>> + 16 - 19: backing file name size
>> + Length of the backing file name in bytes. It will be 0
>> + if the backing file name offset is 0. If backing file
>> + name offset is non-zero, then it must be non-zero. Must
>> + be less than [HEADER_SIZE - 80] to fit in the reserved
>> + part of the header.
>> +
>> + 20 - 23: image file name offset
>> + Offset in the add-cow file at which the image file name
>> + is stored (NB: The string is not null terminated). It
>> + must be between 80 and [HEADER_SIZE - 2].
>> +
>> + 24 - 27: image file name size
>> + Length of the image file name in bytes.
>> + Must be less than [HEADER_SIZE - 80] to fit in the reserved
>> + part of the header.
>> +
>> + 28 - 35: features
>> + Currently only 1 feature bit is used:
>
> What happens when opening a file with an unknown bit set? How must
> unknown bits be initialised?
Okay, I will code as qcow2, report report_unsupported_feature error.
And I will update
the spec file.
>
>> + Feature bits:
>> + * ADD_COW_F_All_ALLOCATED = 0x01.
>
> What does this flag mean, and is it required to be set on that
> condition? Also, please use ALL_CAPS.
This feature bit will used as:
qemu-img create -f add-cow -o image_file=t.raw t.add-cow.
While creating add-cow and without backing_file, this feature can
avoid reading/updating
bitmap. I think it can let the code be more faster.
And also, maybe, I can implement add_cow_check, check if the feature
bit should be set.
How do you think, Kevin?
>
>> +
>> + 36 - 43: optional features
>> + Not used now. Reserved for future use. It must be set to 0.
>
> And must be ignored when reading.
>
Okay.
>> +
>> + 44 - 47: header pages size
>> + The header field is variable-sized. This field indicates
>> + how many pages(4k) will be used to store add-cow header.
>> + In add-cow v1, it is fixed to 1, so the header size will
>> + be 4k * 1 = 4096 bytes.
>
> Why arbitrarily defined "pages" instead of bytes or at least clusters?
Okay, next version I will just caclulate it by bytes.
>
>> +
>> + 48 - 63: backing file format
>> + format of backing file. It will be filled with 0 if
>> + backing file name offset is 0. If backing file name
>> + offset is non-zero, it must be non-zero. It is coded
>> + in free-form ASCII, and is not NUL-terminated.
>
> Zero padded on the right, I guess?
Yes, will update.
>
> Also defining that a string must be "non-zero" looks odd, should
> probably be "non-empty".
>
Okay.
>> +
>> + 64 - 79: image file format
>> + format of image file. It must be non-zero. It is coded
>> + in free-form ASCII, and is not NUL-terminated.
>
> Same here.
Okay.
>
>> +
>> + 80 - [HEADER_SIZE - 1]:
>> + It is used to make sure COW bitmap field starts at the
>> + HEADER_SIZE byte, backing file name and image file name
>> + will be stored here. The bytes that is not pointing to
>> + backing file and image file names will bet set to 0.
>
> "will be set to 0" describes the behaviour of qemu. A spec should
> describe the file format, not a specific implementation. Make it "must"
> or "should".
Okay.
>
>> +
>> +== COW bitmap ==
>> +
>> +The "COW bitmap" field starts at offset HEADER_SIZE, stores a bitmap related to
>> +backing file and image file. The bitmap will track whether the sector in
>> +backing file is dirty or not.
>> +
>> +Each bit in the bitmap indicates one cluster's status. One cluster includes 128
>> +sectors, then each bit indicates 512 * 128 = 64k bytes.
>
> Should we make the cluster size configurable?
>
>> the size of bitmap is
>> +calculated according to virtual size of image file, and it also should be multipe
>
> Typo: multiple
>
> Sure you mean "should", or should it be "must"?
Okay.
>
>> +of 65536, the bits not used will be set to 0. Within each byte, the least
>> +significant bit covers the first cluster. Bit orders in one byte look like:
>> + +----+----+----+----+----+----+----+----+
>> + | b7 | b6 | b5 | b4 | b3 | b2 | b1 | b0 |
>> + +----+----+----+----+----+----+----+----+
>> +
>> +If the bit is 0, indicates the sector has not been allocated in image file, data
>> +should be loaded from backing file while reading; if the bit is 1, indicates the
>> +related sector has been dirty, should be loaded from image file while reading.
>> +Writing to a sector causes the corresponding bit to be set to 1.
>> +
>> +If raw image is not an even multiple of cluster bytes, bits that correspond to
>> +bytes beyond the raw file size in add-cow will be 0.
>
> "must be written as 0 and must be ignored when reading" or something
> like that.
Okay.
>
>> +Image file name and backing file name must NOT be the same, we prevent this
>> +while creating add-cow files.
>
> What we do is irrelevant for a spec.
Okay.
>
>> +Image file and backing file are interpreted relative to the qcow2 file, not
>> +to the current working directory of the process that opened the qcow2 file.
>
> Kevin
>
Thank you, Kevin.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
2012-09-06 17:52 ` Michael Roth
@ 2012-09-11 8:41 ` Kevin Wolf
1 sibling, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11 8:41 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: qemu-devel
Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> add-cow and qcow2 file format will share the same cache code, so rename
> block-cache.c to block-cache.c. And related structure and qcow2 code also
> are changed.
>
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> block.h | 3 +
> block/Makefile.objs | 3 +-
> block/qcow2-cache.c | 323 ------------------------------------------------
> block/qcow2-cluster.c | 66 ++++++----
> block/qcow2-refcount.c | 66 ++++++-----
> block/qcow2.c | 36 +++---
> block/qcow2.h | 24 +---
> trace-events | 13 +-
> 8 files changed, 109 insertions(+), 425 deletions(-)
> delete mode 100644 block/qcow2-cache.c
>
> diff --git a/block.h b/block.h
> index e5dfcd7..c325661 100644
> --- a/block.h
> +++ b/block.h
> @@ -401,6 +401,9 @@ typedef enum {
> BLKDBG_CLUSTER_ALLOC_BYTES,
> BLKDBG_CLUSTER_FREE,
>
> + BLKDBG_ADD_COW_UPDATE,
> + BLKDBG_ADD_COW_LOAD,
> +
I don't think you should add new events, the existing ones should be
generic enough that you can reuse them. It's somewhat hard to see
without block-cache.c, though.
Can you make sure to have one patch with pure code motion, and a
separate one with the changes needed to make it work with add-cow? It
will help reviewers a lot.
> BLKDBG_EVENT_MAX,
> } BlkDebugEvent;
>
> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
> index e179211..335dc7a 100644
> --- a/block/qcow2-cluster.c
> +++ b/block/qcow2-cluster.c
> @@ -28,6 +28,7 @@
> #include "block_int.h"
> #include "block/qcow2.h"
> #include "trace.h"
> +#include "block-cache.h"
>
> int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
> {
> @@ -69,7 +70,8 @@ int qcow2_grow_l1_table(BlockDriverState *bs, int min_size, bool exact_size)
> return new_l1_table_offset;
> }
>
> - ret = qcow2_cache_flush(bs, s->refcount_block_cache);
> + ret = block_cache_flush(bs, s->refcount_block_cache,
> + BLOCK_TABLE_REF, s->cluster_size);
I think its better to pass s->cluster_size to the cache initialisation
instead of in each call of the cache function.
For the blkdebug events I guess it's possible as well to move this to
the initialisation, but I'd have to see the block-cache.c code to say
something specific about this.
> @@ -659,18 +669,16 @@ int qcow2_alloc_cluster_link_l2(BlockDriverState *bs, QCowL2Meta *m)
> * handled.
> */
> if (cow) {
> - qcow2_cache_depends_on_flush(s->l2_table_cache);
> + block_cache_depends_on_flush(s->l2_table_cache);
> }
>
> - if (qcow2_need_accurate_refcounts(s)) {
> - qcow2_cache_set_dependency(bs, s->l2_table_cache,
> - s->refcount_block_cache);
> - }
> + block_cache_set_dependency(bs, s->l2_table_cache, BLOCK_TABLE_L2,
> + s->refcount_block_cache, s->cluster_size);
What happened with lazy refcounting? Is this a mismerge or did you
intentionally remove the condition? (There's a second place where you do
the same)
Kevin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
2012-09-06 20:19 ` Michael Roth
@ 2012-09-11 9:40 ` Kevin Wolf
2012-09-12 7:28 ` Dong Xu Wang
1 sibling, 1 reply; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11 9:40 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: qemu-devel
Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> add-cow file format core code. It use block-cache.c as cache code.
>
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> block/Makefile.objs | 1 +
> block/add-cow.c | 613 +++++++++++++++++++++++++++++++++++++++++++++++++++
> block/add-cow.h | 85 +++++++
> block_int.h | 2 +
> 4 files changed, 701 insertions(+), 0 deletions(-)
> create mode 100644 block/add-cow.c
> create mode 100644 block/add-cow.h
>
> diff --git a/block/Makefile.objs b/block/Makefile.objs
> index 23bdfc8..7ed5051 100644
> --- a/block/Makefile.objs
> +++ b/block/Makefile.objs
> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
> block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
> block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
> block-obj-y += qed-check.o
> +block-obj-y += add-cow.o
> block-obj-y += block-cache.o
> block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
> block-obj-y += stream.o
> diff --git a/block/add-cow.c b/block/add-cow.c
> new file mode 100644
> index 0000000..d4711d5
> --- /dev/null
> +++ b/block/add-cow.c
> @@ -0,0 +1,613 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "block_int.h"
> +#include "module.h"
> +#include "add-cow.h"
> +
> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
> +{
> + cpu->magic = le64_to_cpu(le->magic);
> + cpu->version = le32_to_cpu(le->version);
> +
> + cpu->backing_filename_offset = le32_to_cpu(le->backing_filename_offset);
> + cpu->backing_filename_size = le32_to_cpu(le->backing_filename_size);
> +
> + cpu->image_filename_offset = le32_to_cpu(le->image_filename_offset);
> + cpu->image_filename_size = le32_to_cpu(le->image_filename_size);
> +
> + cpu->features = le64_to_cpu(le->features);
> + cpu->optional_features = le64_to_cpu(le->optional_features);
> + cpu->header_pages_size = le32_to_cpu(le->header_pages_size);
> +}
> +
> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
> +{
> + le->magic = cpu_to_le64(cpu->magic);
> + le->version = cpu_to_le32(cpu->version);
> +
> + le->backing_filename_offset = cpu_to_le32(cpu->backing_filename_offset);
> + le->backing_filename_size = cpu_to_le32(cpu->backing_filename_size);
> +
> + le->image_filename_offset = cpu_to_le32(cpu->image_filename_offset);
> + le->image_filename_size = cpu_to_le32(cpu->image_filename_size);
> +
> + le->features = cpu_to_le64(cpu->features);
> + le->optional_features = cpu_to_le64(cpu->optional_features);
> + le->header_pages_size = cpu_to_le32(cpu->header_pages_size);
> +}
> +
> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
> +{
> + const AddCowHeader *header = (const AddCowHeader *)buf;
> +
> + if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
> + le32_to_cpu(header->version) == ADD_COW_VERSION) {
> + return 100;
> + } else {
> + return 0;
> + }
> +}
> +
> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
> +{
> + AddCowHeader header = {
> + .magic = ADD_COW_MAGIC,
> + .version = ADD_COW_VERSION,
> + .features = 0,
> + .optional_features = 0,
> + .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
> + };
> + AddCowHeader le_header;
> + int64_t image_len = 0;
> + const char *backing_filename = NULL;
> + const char *backing_fmt = NULL;
> + const char *image_filename = NULL;
> + const char *image_format = NULL;
> + BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
> + BlockDriver *drv = bdrv_find_format("add-cow");
> + BDRVAddCowState s;
> + int ret;
> +
> + while (options && options->name) {
> + if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
> + image_len = options->value.n;
> + } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
> + backing_filename = options->value.s;
> + } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
> + backing_fmt = options->value.s;
> + } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
> + image_filename = options->value.s;
> + } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
> + image_format = options->value.s;
> + }
> + options++;
> + }
> +
> + if (backing_filename) {
> + header.backing_filename_offset = sizeof(header)
> + + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
> + header.backing_filename_size = strlen(backing_filename);
> +
> + if (!backing_fmt) {
> + backing_bs = bdrv_new("image");
> + ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
> + | BDRV_O_CACHE_WB, NULL);
> + if (ret < 0) {
> + return ret;
> + }
> + backing_fmt = bdrv_get_format_name(backing_bs);
> + bdrv_delete(backing_bs);
> + }
> + } else {
> + header.features |= ADD_COW_F_All_ALLOCATED;
> + }
> +
> + if (image_filename) {
> + header.image_filename_offset =
> + sizeof(header) + sizeof(s.backing_file_format)
> + + sizeof(s.image_file_format) + header.backing_filename_size;
> + header.image_filename_size = strlen(image_filename);
> + } else {
> + error_report("Error: image_file should be given.");
> + return -EINVAL;
> + }
> +
> + if (backing_filename && !strcmp(backing_filename, image_filename)) {
> + error_report("Error: Trying to create an image with the "
> + "same backing file name as the image file name");
> + return -EINVAL;
> + }
> +
> + if (!strcmp(filename, image_filename)) {
> + error_report("Error: Trying to create an image with the "
> + "same filename as the image file name");
> + return -EINVAL;
> + }
> +
> + if (header.image_filename_offset + header.image_filename_size
> + > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
> + error_report("image_file name or backing_file name too long.");
> + return -ENOSPC;
> + }
> +
> + ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
> + if (ret < 0) {
> + return ret;
> + }
> + bdrv_delete(image_bs);
> +
> + ret = bdrv_create_file(filename, NULL);
> + if (ret < 0) {
> + return ret;
> + }
> +
> + ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
> + if (ret < 0) {
> + return ret;
> + }
> + add_cow_header_cpu_to_le(&header, &le_header);
> + ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> +
> + ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
> + backing_fmt ? strlen(backing_fmt) : 0);
The spec requires zero padding, which you don't do here.
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> +
> + ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
> + image_format ? image_format : "raw",
> + image_format ? strlen(image_format) : sizeof("raw"));
And here.
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> +
> + if (backing_filename) {
> + ret = bdrv_pwrite(bs, header.backing_filename_offset,
> + backing_filename, header.backing_filename_size);
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> + }
> +
> + ret = bdrv_pwrite(bs, header.image_filename_offset,
> + image_filename, header.image_filename_size);
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> +
> + ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
> + if (ret < 0) {
> + bdrv_delete(bs);
> + return ret;
> + }
> +
> + ret = bdrv_truncate(bs, image_len);
> + bdrv_delete(bs);
> + return ret;
> +}
> +
> +static int add_cow_open(BlockDriverState *bs, int flags)
> +{
> + char image_filename[ADD_COW_FILE_LEN];
> + char tmp_name[ADD_COW_FILE_LEN];
> + BlockDriver *image_drv = NULL;
> + int ret;
> + int sector_per_byte;
> + BDRVAddCowState *s = bs->opaque;
> + AddCowHeader le_header;
> +
> + ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
> + if (ret != sizeof(s->header)) {
if (ret < 0) would be more consistent with the rest of the code.
> + goto fail;
> + }
> +
> + add_cow_header_le_to_cpu(&le_header, &s->header);
> +
> + if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
Isn't this one endianess conversion too much? s->header is already LE.
Did you test add-cow on a big endian host?
> + ret = -EINVAL;
> + goto fail;
> + }
> +
> + if (s->header.version != ADD_COW_VERSION) {
> + char version[64];
> + snprintf(version, sizeof(version), "ADD-COW version %d",
> + s->header.version);
> + qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> + bs->device_name, "add-cow", version);
> + ret = -ENOTSUP;
> + goto fail;
> + }
> +
> + if (s->header.features & ~ADD_COW_FEATURE_MASK) {
> + char buf[64];
> + snprintf(buf, sizeof(buf), "%" PRIx64,
> + s->header.features & ~ADD_COW_FEATURE_MASK);
This message is a bit terse, most users will be confused with an error
message that only consists of a hex number. Maybe better "Feature flags:
%" PRIx64.
> + qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> + bs->device_name, "add-cow", buf);
> + return -ENOTSUP;
> + }
> +
> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> + ret = bdrv_read_string(bs->file, sizeof(s->header),
> + sizeof(s->backing_file_format) - 1, s->backing_file_format,
> + sizeof(s->backing_file_format));
> + if (ret < 0) {
> + goto fail;
> + }
> + }
Would be great if this was not only read into memory, but actually
used... It must end up in bs->backing_format in order take effect.
> +
> + ret = bdrv_read_string(bs->file,
> + sizeof(s->header) + sizeof(s->image_file_format),
> + sizeof(s->image_file_format) - 1, s->image_file_format,
> + sizeof(s->image_file_format));
> + if (ret < 0) {
> + goto fail;
> + }
This one is unused, too.
> +
> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> + ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
> + s->header.backing_filename_size, bs->backing_file,
> + sizeof(bs->backing_file));
> + if (ret < 0) {
> + goto fail;
> + }
> + }
> +
> + ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
> + s->header.image_filename_size, tmp_name,
> + sizeof(tmp_name));
> + if (ret < 0) {
> + goto fail;
> + }
> +
> + s->image_hd = bdrv_new("");
> + if (path_has_protocol(image_filename)) {
> + pstrcpy(image_filename, sizeof(image_filename), tmp_name);
> + } else {
> + path_combine(image_filename, sizeof(image_filename),
> + bs->filename, tmp_name);
> + }
> +
> + ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
image_drv is always NULL.
> + if (ret < 0) {
> + bdrv_delete(s->image_hd);
> + goto fail;
> + }
> +
> + bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
> + s->cluster_size = ADD_COW_CLUSTER_SIZE;
> + sector_per_byte = SECTORS_PER_CLUSTER * 8;
> + s->bitmap_size =
> + (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
> + s->bitmap_cache =
> + block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
> +
> + qemu_co_mutex_init(&s->lock);
> + return 0;
> +fail:
> + if (s->bitmap_cache) {
> + block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> + }
> + return ret;
> +}
> +
> +static void add_cow_close(BlockDriverState *bs)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
> + bdrv_delete(s->image_hd);
> +}
> +
> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + BlockCache *c = s->bitmap_cache;
> + int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> + uint8_t *table = NULL;
> + uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> + + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
> + int ret = block_cache_get(bs, s->bitmap_cache, offset,
> + (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
No matching block_cache_put?
> +
> + if (ret < 0) {
> + return ret;
> + }
> + return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
> + & (1 << (cluster_num % 8));
> +}
> +
> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
> + int64_t sector_num, int nb_sectors, int *num_same)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + int changed;
> +
> + if (nb_sectors == 0) {
> + *num_same = 0;
> + return 0;
> + }
> +
> + if (s->header.features & ADD_COW_F_All_ALLOCATED) {
> + *num_same = nb_sectors - 1;
Why - 1?
> + return 1;
> + }
> + changed = is_allocated(bs, sector_num);
> +
> + for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
> + if (is_allocated(bs, sector_num + *num_same) != changed) {
> + break;
> + }
> + }
> + return changed;
> +}
> +
> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
> + int64_t sector_num, int nb_sectors)
> +{
> + int n1;
> + if ((sector_num + nb_sectors) <= bs->total_sectors) {
> + return nb_sectors;
> + }
> + if (sector_num >= bs->total_sectors) {
> + n1 = 0;
> + } else {
> + n1 = bs->total_sectors - sector_num;
> + }
> +
> + qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
> + 0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
> +
> + return n1;
> +}
> +
> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
> + int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + int cur_nr_sectors;
> + uint64_t bytes_done = 0;
> + QEMUIOVector hd_qiov;
> + int n, n1, ret = 0;
> +
> + qemu_iovec_init(&hd_qiov, qiov->niov);
> + qemu_co_mutex_lock(&s->lock);
> + while (remaining_sectors != 0) {
> + cur_nr_sectors = remaining_sectors;
> + if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
> + cur_nr_sectors = n;
One of n and cur_nr_sectors is redundant.
> + qemu_iovec_reset(&hd_qiov);
> + qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> + cur_nr_sectors * BDRV_SECTOR_SIZE);
> + qemu_co_mutex_unlock(&s->lock);
> + ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
> + qemu_co_mutex_lock(&s->lock);
> + if (ret < 0) {
> + goto fail;
> + }
> + } else {
> + cur_nr_sectors = n;
> + if (bs->backing_hd) {
> + qemu_iovec_reset(&hd_qiov);
> + qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
> + cur_nr_sectors * BDRV_SECTOR_SIZE);
> + n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
> + sector_num, cur_nr_sectors);
> + if (n1 > 0) {
> + qemu_co_mutex_unlock(&s->lock);
> + ret = bdrv_co_readv(bs->backing_hd, sector_num,
> + n, &hd_qiov);
> + qemu_co_mutex_lock(&s->lock);
> + if (ret < 0) {
> + goto fail;
> + }
> + }
> + } else {
> + qemu_iovec_memset(&hd_qiov, 0, 0,
> + BDRV_SECTOR_SIZE * cur_nr_sectors);
> + }
> + }
> + remaining_sectors -= cur_nr_sectors;
> + sector_num += cur_nr_sectors;
> + bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
> + }
> +fail:
> + qemu_co_mutex_unlock(&s->lock);
> + qemu_iovec_destroy(&hd_qiov);
> + return ret;
> +}
> +
> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
> + int n_start, int n_end)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + QEMUIOVector qiov;
> + struct iovec iov;
> + int n, ret;
> +
> + n = n_end - n_start;
> + if (n <= 0) {
> + return 0;
> + }
> +
> + iov.iov_len = n * BDRV_SECTOR_SIZE;
> + iov.iov_base = qemu_blockalign(bs, iov.iov_len);
> +
> + qemu_iovec_init_external(&qiov, &iov, 1);
> +
> + ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
> + if (ret < 0) {
> + goto out;
> + }
> + ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
> + if (ret < 0) {
> + goto out;
> + }
> +
> + ret = 0;
> +out:
> + qemu_vfree(iov.iov_base);
> + return ret;
> +}
> +
> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
> + int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + BlockCache *c = s->bitmap_cache;
> + int ret = 0, i;
> + QEMUIOVector hd_qiov;
> + uint8_t *table;
> + uint64_t offset;
> +
> + qemu_co_mutex_lock(&s->lock);
> + qemu_iovec_init(&hd_qiov, qiov->niov);
> + ret = bdrv_co_writev(s->image_hd,
> + sector_num,
> + remaining_sectors, qiov);
> +
> + if (ret < 0) {
> + goto fail;
> + }
> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
> + /* Copy content of unmodified sectors */
> + if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
> + ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
> + sector_num);
> + if (ret < 0) {
> + goto fail;
> + }
> + }
> +
> + if (!is_cluster_tail(sector_num + remaining_sectors - 1)
> + && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
> + ret = copy_sectors(bs, sector_num + remaining_sectors,
> + ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
> + if (ret < 0) {
> + goto fail;
> + }
> + }
> +
> + for (i = sector_num / SECTORS_PER_CLUSTER;
> + i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
> + i++) {
> + offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
> + + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
The maths in this loop looks a bit confusing, but I think it's correct.
> + ret = block_cache_get(bs, s->bitmap_cache, offset,
> + (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
> + if (ret < 0) {
> + goto fail;
> + }
> + if ((table[i / 8] & (1 << (i % 8))) == 0) {
> + table[i / 8] |= (1 << (i % 8));
> + block_cache_entry_mark_dirty(s->bitmap_cache, table);
> + }
Missing block_cache_put again?
> + }
> + }
> + ret = 0;
> +fail:
> + qemu_co_mutex_unlock(&s->lock);
> + qemu_iovec_destroy(&hd_qiov);
> + return ret;
> +}
> +
> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + int sector_per_byte = SECTORS_PER_CLUSTER * 8;
> + int ret;
> + uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
> + int64_t bitmap_size =
> + (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
> + bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
> + & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
> +
> + ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
> + if (ret < 0) {
> + return ret;
> + }
> + return 0;
> +}
So you don't truncate s->image_file? Does this work?
> +
> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
> +{
> + BDRVAddCowState *s = bs->opaque;
> + int ret;
> +
> + qemu_co_mutex_lock(&s->lock);
> + ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
> + ADD_COW_CACHE_ENTRY_SIZE);
> + qemu_co_mutex_unlock(&s->lock);
> + return ret;
> +}
What about flushing s->image_file?
> +
> +static QEMUOptionParameter add_cow_create_options[] = {
> + {
> + .name = BLOCK_OPT_SIZE,
> + .type = OPT_SIZE,
> + .help = "Virtual disk size"
> + },
> + {
> + .name = BLOCK_OPT_BACKING_FILE,
> + .type = OPT_STRING,
> + .help = "File name of a base image"
> + },
> + {
> + .name = BLOCK_OPT_BACKING_FMT,
> + .type = OPT_STRING,
> + .help = "Image format of the base image"
> + },
> + {
> + .name = BLOCK_OPT_IMAGE_FILE,
> + .type = OPT_STRING,
> + .help = "File name of a image file"
> + },
> + {
> + .name = BLOCK_OPT_IMAGE_FORMAT,
> + .type = OPT_STRING,
> + .help = "Image format of the image file"
> + },
> + { NULL }
> +};
> +
> +static BlockDriver bdrv_add_cow = {
> + .format_name = "add-cow",
> + .instance_size = sizeof(BDRVAddCowState),
> + .bdrv_probe = add_cow_probe,
> + .bdrv_open = add_cow_open,
> + .bdrv_close = add_cow_close,
> + .bdrv_create = add_cow_create,
> + .bdrv_co_readv = add_cow_co_readv,
> + .bdrv_co_writev = add_cow_co_writev,
> + .bdrv_truncate = bdrv_add_cow_truncate,
> + .bdrv_co_is_allocated = add_cow_is_allocated,
> +
> + .create_options = add_cow_create_options,
> + .bdrv_co_flush_to_os = add_cow_co_flush,
> +};
> +
> +static void bdrv_add_cow_init(void)
> +{
> + bdrv_register(&bdrv_add_cow);
> +}
> +
> +block_init(bdrv_add_cow_init);
> diff --git a/block/add-cow.h b/block/add-cow.h
> new file mode 100644
> index 0000000..f058376
> --- /dev/null
> +++ b/block/add-cow.h
> @@ -0,0 +1,85 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> + *
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#ifndef BLOCK_ADD_COW_H
> +#define BLOCK_ADD_COW_H
> +#include "block-cache.h"
> +
> +enum {
> + ADD_COW_F_All_ALLOCATED = 0X01,
> + ADD_COW_FEATURE_MASK = ADD_COW_F_All_ALLOCATED,
> +
> + ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
> + ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
> + ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
> + ((uint64_t)'W' << 8) | 0xFF),
> + ADD_COW_VERSION = 1,
> + ADD_COW_FILE_LEN = 1024,
> + ADD_COW_CACHE_SIZE = 16,
> + ADD_COW_CACHE_ENTRY_SIZE = 65536,
> + ADD_COW_CLUSTER_SIZE = 65536,
> + SECTORS_PER_CLUSTER = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
> + ADD_COW_PAGE_SIZE = 4096,
> + ADD_COW_DEFAULT_PAGE_SIZE = 1,
> +};
> +
> +typedef struct AddCowHeader {
> + uint64_t magic;
> + uint32_t version;
> +
> + uint32_t backing_filename_offset;
> + uint32_t backing_filename_size;
> +
> + uint32_t image_filename_offset;
> + uint32_t image_filename_size;
> +
> + uint64_t features;
> + uint64_t optional_features;
> + uint32_t header_pages_size;
> +} QEMU_PACKED AddCowHeader;
Why aren't backing/image_file_format part of the header here? They are
in the spec. It would also simplify some offset calculation code.
> +
> +typedef struct BDRVAddCowState {
> + BlockDriverState *image_hd;
> + CoMutex lock;
> + int cluster_size;
> + BlockCache *bitmap_cache;
> + uint64_t bitmap_size;
> + AddCowHeader header;
> + char backing_file_format[16];
> + char image_file_format[16];
> +} BDRVAddCowState;
> +
> +/* Convert sector_num to offset in bitmap */
> +static inline int64_t offset_in_bitmap(int64_t sector_num)
> +{
> + int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
> + return cluster_num / 8;
> +}
> +
> +static inline bool is_cluster_head(int64_t sector_num)
> +{
> + return sector_num % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +static inline bool is_cluster_tail(int64_t sector_num)
> +{
> + return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
> +}
> +
> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
> + void **table);
> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
These functions don't really exist any more, do they?
Kevin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
2012-09-10 2:25 ` Dong Xu Wang
@ 2012-09-11 9:44 ` Kevin Wolf
0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11 9:44 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: Michael Roth, qemu-devel
Am 10.09.2012 04:25, schrieb Dong Xu Wang:
> On Fri, Sep 7, 2012 at 4:19 AM, Michael Roth <mdroth@linux.vnet.ibm.com> wrote:
>> On Fri, Aug 10, 2012 at 11:39:44PM +0800, Dong Xu Wang wrote:
>>> +typedef struct AddCowHeader {
>>> + uint64_t magic;
>>> + uint32_t version;
>>> +
>>> + uint32_t backing_filename_offset;
>>> + uint32_t backing_filename_size;
>>> +
>>> + uint32_t image_filename_offset;
>>> + uint32_t image_filename_size;
>>> +
>>> + uint64_t features;
>>> + uint64_t optional_features;
>>> + uint32_t header_pages_size;
>>> +} QEMU_PACKED AddCowHeader;
>>
>> You should avoid using packed structures for image format headers.
>> Instead, I would either:
>>
>> a) re-order the fields so that 32/64-bit fields, respectively, fall on
>> 32/64-bit boundaries (in your case, for instance, moving header_pages_size
>> above features) like qed/qcow2 do, or
>>
>> b) read/write the fields individually rather than reading/writing directly
>> into/from the header struct.
>>
>> The safest route is b). Adds a few lines of code, but you won't have to
>> re-work things (or worry about introducing bugs) later if you were to add,
>> say, a 32-bit value, and then a 64-bit value later.
>
> While, Kevin's suggestion is using PACKED, so ..
Yes, I think QEMU_PACKED is fine, and it's the safest version.
It would be nice to additionally do Michael's option a) if you like, but
I don't think the header is accessed too often, so the optimisation
isn't that important.
Kevin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
@ 2012-09-11 9:55 ` Kevin Wolf
0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-11 9:55 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: qemu-devel
Am 10.08.2012 17:39, schrieb Dong Xu Wang:
> Add qemu-iotests support for add-cow.
>
> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
> ---
> tests/qemu-iotests/017 | 2 +-
> tests/qemu-iotests/020 | 2 +-
> tests/qemu-iotests/check | 4 ++--
> tests/qemu-iotests/common | 6 ++++++
> tests/qemu-iotests/common.rc | 19 +++++++++++++++++++
> 5 files changed, 29 insertions(+), 4 deletions(-)
> diff --git a/tests/qemu-iotests/check b/tests/qemu-iotests/check
> index 432732c..122267b 100755
> --- a/tests/qemu-iotests/check
> +++ b/tests/qemu-iotests/check
> @@ -243,7 +243,7 @@ do
> echo " - no qualified output"
> err=true
> else
> - if diff -w $seq.out $tmp.out >/dev/null 2>&1
> + if diff -w -I "^Formatting" $seq.out $tmp.out >/dev/null 2>&1
> then
> echo ""
> if $err
> @@ -255,7 +255,7 @@ do
> else
> echo " - output mismatch (see $seq.out.bad)"
> mv $tmp.out $seq.out.bad
> - $diff -w $seq.out $seq.out.bad
> + $diff -w -I "^Formatting" $seq.out $seq.out.bad
> err=true
> fi
> fi
These two hunks don't look right. You probably want to amend the sed
command in _make_test_img().
> diff --git a/tests/qemu-iotests/common.rc b/tests/qemu-iotests/common.rc
> index 7782808..ec5afd7 100644
> --- a/tests/qemu-iotests/common.rc
> +++ b/tests/qemu-iotests/common.rc
> @@ -97,6 +97,18 @@ _make_test_img()
> fi
> if [ \( "$IMGFMT" = "qcow2" -o "$IMGFMT" = "qed" \) -a -n "$CLUSTER_SIZE" ]; then
> optstr=$(_optstr_add "$optstr" "cluster_size=$CLUSTER_SIZE")
> + elif [ "$IMGFMT" = "add-cow" ]; then
> + local BACKING="$TEST_IMG"".qcow2"
> + local IMG="$TEST_IMG"".raw"
> + if [ "$1" = "-b" ]; then
> + IMG="$IMG"".b"
> + $QEMU_IMG create -f raw $IMG $image_size>/dev/null
> + extra_img_options="-o image_file=$IMG $extra_img_options"
> + else
> + $QEMU_IMG create -f raw $IMG $image_size>/dev/null
> + $QEMU_IMG create -f qcow2 $BACKING $image_size>/dev/null
> + extra_img_options="-o backing_file=$BACKING,image_file=$IMG"
> + fi
This looks a bit hackish... Doesn't it completely ignore the requested
backing file name? I'm not sure if this is a good idea.
Can't you just create the raw image file and then use _optstr_add to add
the right -o image_file=... option? It should automatically get the
backing file right.
> fi
>
> if [ -n "$optstr" ]; then
> @@ -125,6 +137,13 @@ _cleanup_test_img()
> rm -f $TEST_DIR/t.$IMGFMT
> rm -f $TEST_DIR/t.$IMGFMT.orig
> rm -f $TEST_DIR/t.$IMGFMT.base
> + if [ "$IMGFMT" = "add-cow" ]; then
> + rm -f $TEST_DIR/t.$IMGFMT.qcow2
> + rm -f $TEST_DIR/t.$IMGFMT.raw
> + rm -f $TEST_DIR/t.$IMGFMT.raw.b
> + rm -f $TEST_DIR/t.$IMGFMT.ct.qcow2
> + rm -f $TEST_DIR/t.$IMGFMT.ct.raw
What are the .ct files?
Kevin
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
2012-09-11 9:40 ` Kevin Wolf
@ 2012-09-12 7:28 ` Dong Xu Wang
2012-09-12 7:50 ` Kevin Wolf
0 siblings, 1 reply; 25+ messages in thread
From: Dong Xu Wang @ 2012-09-12 7:28 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel
On Tue, Sep 11, 2012 at 5:40 PM, Kevin Wolf <kwolf@redhat.com> wrote:
> Am 10.08.2012 17:39, schrieb Dong Xu Wang:
>> add-cow file format core code. It use block-cache.c as cache code.
>>
>> Signed-off-by: Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> ---
>> block/Makefile.objs | 1 +
>> block/add-cow.c | 613 +++++++++++++++++++++++++++++++++++++++++++++++++++
>> block/add-cow.h | 85 +++++++
>> block_int.h | 2 +
>> 4 files changed, 701 insertions(+), 0 deletions(-)
>> create mode 100644 block/add-cow.c
>> create mode 100644 block/add-cow.h
>>
>> diff --git a/block/Makefile.objs b/block/Makefile.objs
>> index 23bdfc8..7ed5051 100644
>> --- a/block/Makefile.objs
>> +++ b/block/Makefile.objs
>> @@ -2,6 +2,7 @@ block-obj-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vvfat
>> block-obj-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o
>> block-obj-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>> block-obj-y += qed-check.o
>> +block-obj-y += add-cow.o
>> block-obj-y += block-cache.o
>> block-obj-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>> block-obj-y += stream.o
>> diff --git a/block/add-cow.c b/block/add-cow.c
>> new file mode 100644
>> index 0000000..d4711d5
>> --- /dev/null
>> +++ b/block/add-cow.c
>> @@ -0,0 +1,613 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#include "qemu-common.h"
>> +#include "block_int.h"
>> +#include "module.h"
>> +#include "add-cow.h"
>> +
>> +static void add_cow_header_le_to_cpu(const AddCowHeader *le, AddCowHeader *cpu)
>> +{
>> + cpu->magic = le64_to_cpu(le->magic);
>> + cpu->version = le32_to_cpu(le->version);
>> +
>> + cpu->backing_filename_offset = le32_to_cpu(le->backing_filename_offset);
>> + cpu->backing_filename_size = le32_to_cpu(le->backing_filename_size);
>> +
>> + cpu->image_filename_offset = le32_to_cpu(le->image_filename_offset);
>> + cpu->image_filename_size = le32_to_cpu(le->image_filename_size);
>> +
>> + cpu->features = le64_to_cpu(le->features);
>> + cpu->optional_features = le64_to_cpu(le->optional_features);
>> + cpu->header_pages_size = le32_to_cpu(le->header_pages_size);
>> +}
>> +
>> +static void add_cow_header_cpu_to_le(const AddCowHeader *cpu, AddCowHeader *le)
>> +{
>> + le->magic = cpu_to_le64(cpu->magic);
>> + le->version = cpu_to_le32(cpu->version);
>> +
>> + le->backing_filename_offset = cpu_to_le32(cpu->backing_filename_offset);
>> + le->backing_filename_size = cpu_to_le32(cpu->backing_filename_size);
>> +
>> + le->image_filename_offset = cpu_to_le32(cpu->image_filename_offset);
>> + le->image_filename_size = cpu_to_le32(cpu->image_filename_size);
>> +
>> + le->features = cpu_to_le64(cpu->features);
>> + le->optional_features = cpu_to_le64(cpu->optional_features);
>> + le->header_pages_size = cpu_to_le32(cpu->header_pages_size);
>> +}
>> +
>> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char *filename)
>> +{
>> + const AddCowHeader *header = (const AddCowHeader *)buf;
>> +
>> + if (le64_to_cpu(header->magic) == ADD_COW_MAGIC &&
>> + le32_to_cpu(header->version) == ADD_COW_VERSION) {
>> + return 100;
>> + } else {
>> + return 0;
>> + }
>> +}
>> +
>> +static int add_cow_create(const char *filename, QEMUOptionParameter *options)
>> +{
>> + AddCowHeader header = {
>> + .magic = ADD_COW_MAGIC,
>> + .version = ADD_COW_VERSION,
>> + .features = 0,
>> + .optional_features = 0,
>> + .header_pages_size = ADD_COW_DEFAULT_PAGE_SIZE,
>> + };
>> + AddCowHeader le_header;
>> + int64_t image_len = 0;
>> + const char *backing_filename = NULL;
>> + const char *backing_fmt = NULL;
>> + const char *image_filename = NULL;
>> + const char *image_format = NULL;
>> + BlockDriverState *bs, *image_bs = NULL, *backing_bs = NULL;
>> + BlockDriver *drv = bdrv_find_format("add-cow");
>> + BDRVAddCowState s;
>> + int ret;
>> +
>> + while (options && options->name) {
>> + if (!strcmp(options->name, BLOCK_OPT_SIZE)) {
>> + image_len = options->value.n;
>> + } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FILE)) {
>> + backing_filename = options->value.s;
>> + } else if (!strcmp(options->name, BLOCK_OPT_BACKING_FMT)) {
>> + backing_fmt = options->value.s;
>> + } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FILE)) {
>> + image_filename = options->value.s;
>> + } else if (!strcmp(options->name, BLOCK_OPT_IMAGE_FORMAT)) {
>> + image_format = options->value.s;
>> + }
>> + options++;
>> + }
>> +
>> + if (backing_filename) {
>> + header.backing_filename_offset = sizeof(header)
>> + + sizeof(s.backing_file_format) + sizeof(s.image_file_format);
>> + header.backing_filename_size = strlen(backing_filename);
>> +
>> + if (!backing_fmt) {
>> + backing_bs = bdrv_new("image");
>> + ret = bdrv_open(backing_bs, backing_filename, BDRV_O_RDWR
>> + | BDRV_O_CACHE_WB, NULL);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + backing_fmt = bdrv_get_format_name(backing_bs);
>> + bdrv_delete(backing_bs);
>> + }
>> + } else {
>> + header.features |= ADD_COW_F_All_ALLOCATED;
>> + }
>> +
>> + if (image_filename) {
>> + header.image_filename_offset =
>> + sizeof(header) + sizeof(s.backing_file_format)
>> + + sizeof(s.image_file_format) + header.backing_filename_size;
>> + header.image_filename_size = strlen(image_filename);
>> + } else {
>> + error_report("Error: image_file should be given.");
>> + return -EINVAL;
>> + }
>> +
>> + if (backing_filename && !strcmp(backing_filename, image_filename)) {
>> + error_report("Error: Trying to create an image with the "
>> + "same backing file name as the image file name");
>> + return -EINVAL;
>> + }
>> +
>> + if (!strcmp(filename, image_filename)) {
>> + error_report("Error: Trying to create an image with the "
>> + "same filename as the image file name");
>> + return -EINVAL;
>> + }
>> +
>> + if (header.image_filename_offset + header.image_filename_size
>> + > ADD_COW_PAGE_SIZE * ADD_COW_DEFAULT_PAGE_SIZE) {
>> + error_report("image_file name or backing_file name too long.");
>> + return -ENOSPC;
>> + }
>> +
>> + ret = bdrv_file_open(&image_bs, image_filename, BDRV_O_RDWR);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + bdrv_delete(image_bs);
>> +
>> + ret = bdrv_create_file(filename, NULL);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> +
>> + ret = bdrv_file_open(&bs, filename, BDRV_O_RDWR);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + add_cow_header_cpu_to_le(&header, &le_header);
>> + ret = bdrv_pwrite(bs, 0, &le_header, sizeof(le_header));
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> +
>> + ret = bdrv_pwrite(bs, sizeof(le_header), backing_fmt ? backing_fmt : "",
>> + backing_fmt ? strlen(backing_fmt) : 0);
>
> The spec requires zero padding, which you don't do here.
Okay.
>
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> +
>> + ret = bdrv_pwrite(bs, sizeof(le_header) + sizeof(s.backing_file_format),
>> + image_format ? image_format : "raw",
>> + image_format ? strlen(image_format) : sizeof("raw"));
>
> And here.
Okay.
>
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> +
>> + if (backing_filename) {
>> + ret = bdrv_pwrite(bs, header.backing_filename_offset,
>> + backing_filename, header.backing_filename_size);
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> + }
>> +
>> + ret = bdrv_pwrite(bs, header.image_filename_offset,
>> + image_filename, header.image_filename_size);
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> +
>> + ret = bdrv_open(bs, filename, BDRV_O_RDWR | BDRV_O_NO_FLUSH, drv);
>> + if (ret < 0) {
>> + bdrv_delete(bs);
>> + return ret;
>> + }
>> +
>> + ret = bdrv_truncate(bs, image_len);
>> + bdrv_delete(bs);
>> + return ret;
>> +}
>> +
>> +static int add_cow_open(BlockDriverState *bs, int flags)
>> +{
>> + char image_filename[ADD_COW_FILE_LEN];
>> + char tmp_name[ADD_COW_FILE_LEN];
>> + BlockDriver *image_drv = NULL;
>> + int ret;
>> + int sector_per_byte;
>> + BDRVAddCowState *s = bs->opaque;
>> + AddCowHeader le_header;
>> +
>> + ret = bdrv_pread(bs->file, 0, &le_header, sizeof(le_header));
>> + if (ret != sizeof(s->header)) {
>
> if (ret < 0) would be more consistent with the rest of the code.
>
Okay.
>> + goto fail;
>> + }
>> +
>> + add_cow_header_le_to_cpu(&le_header, &s->header);
>> +
>> + if (le64_to_cpu(s->header.magic) != ADD_COW_MAGIC) {
>
> Isn't this one endianess conversion too much? s->header is already LE.
>
> Did you test add-cow on a big endian host?
My fault, will correct it in next version.
>
>> + ret = -EINVAL;
>> + goto fail;
>> + }
>> +
>> + if (s->header.version != ADD_COW_VERSION) {
>> + char version[64];
>> + snprintf(version, sizeof(version), "ADD-COW version %d",
>> + s->header.version);
>> + qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> + bs->device_name, "add-cow", version);
>> + ret = -ENOTSUP;
>> + goto fail;
>> + }
>> +
>> + if (s->header.features & ~ADD_COW_FEATURE_MASK) {
>> + char buf[64];
>> + snprintf(buf, sizeof(buf), "%" PRIx64,
>> + s->header.features & ~ADD_COW_FEATURE_MASK);
>
> This message is a bit terse, most users will be confused with an error
> message that only consists of a hex number. Maybe better "Feature flags:
> %" PRIx64.
>
Okay.
>> + qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
>> + bs->device_name, "add-cow", buf);
>> + return -ENOTSUP;
>> + }
>> +
>> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> + ret = bdrv_read_string(bs->file, sizeof(s->header),
>> + sizeof(s->backing_file_format) - 1, s->backing_file_format,
>> + sizeof(s->backing_file_format));
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + }
>
> Would be great if this was not only read into memory, but actually
> used... It must end up in bs->backing_format in order take effect.
>
>> +
>> + ret = bdrv_read_string(bs->file,
>> + sizeof(s->header) + sizeof(s->image_file_format),
>> + sizeof(s->image_file_format) - 1, s->image_file_format,
>> + sizeof(s->image_file_format));
>> + if (ret < 0) {
>> + goto fail;
>> + }
>
> This one is unused, too.
>
Okay.
>> +
>> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> + ret = bdrv_read_string(bs->file, s->header.backing_filename_offset,
>> + s->header.backing_filename_size, bs->backing_file,
>> + sizeof(bs->backing_file));
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + }
>> +
>> + ret = bdrv_read_string(bs->file, s->header.image_filename_offset,
>> + s->header.image_filename_size, tmp_name,
>> + sizeof(tmp_name));
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> +
>> + s->image_hd = bdrv_new("");
>> + if (path_has_protocol(image_filename)) {
>> + pstrcpy(image_filename, sizeof(image_filename), tmp_name);
>> + } else {
>> + path_combine(image_filename, sizeof(image_filename),
>> + bs->filename, tmp_name);
>> + }
>> +
>> + ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
>
> image_drv is always NULL.
>
>> + if (ret < 0) {
>> + bdrv_delete(s->image_hd);
>> + goto fail;
>> + }
>> +
>> + bs->total_sectors = bdrv_getlength(s->image_hd) >> 9;
>> + s->cluster_size = ADD_COW_CLUSTER_SIZE;
>> + sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> + s->bitmap_size =
>> + (bs->total_sectors + sector_per_byte - 1) / sector_per_byte;
>> + s->bitmap_cache =
>> + block_cache_create(bs, ADD_COW_CACHE_SIZE, ADD_COW_CACHE_ENTRY_SIZE);
>> +
>> + qemu_co_mutex_init(&s->lock);
>> + return 0;
>> +fail:
>> + if (s->bitmap_cache) {
>> + block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> + }
>> + return ret;
>> +}
>> +
>> +static void add_cow_close(BlockDriverState *bs)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + block_cache_destroy(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP);
>> + bdrv_delete(s->image_hd);
>> +}
>> +
>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + BlockCache *c = s->bitmap_cache;
>> + int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> + uint8_t *table = NULL;
>> + uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> + + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
>> + int ret = block_cache_get(bs, s->bitmap_cache, offset,
>> + (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>
> No matching block_cache_put?
>
>> +
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
>> + & (1 << (cluster_num % 8));
>> +}
>> +
>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>> + int64_t sector_num, int nb_sectors, int *num_same)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + int changed;
>> +
>> + if (nb_sectors == 0) {
>> + *num_same = 0;
>> + return 0;
>> + }
>> +
>> + if (s->header.features & ADD_COW_F_All_ALLOCATED) {
>> + *num_same = nb_sectors - 1;
>
> Why - 1?
>
>> + return 1;
>> + }
>> + changed = is_allocated(bs, sector_num);
>> +
>> + for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>> + if (is_allocated(bs, sector_num + *num_same) != changed) {
>> + break;
>> + }
>> + }
>> + return changed;
>> +}
>> +
>> +static int add_cow_backing_read(BlockDriverState *bs, QEMUIOVector *qiov,
>> + int64_t sector_num, int nb_sectors)
>> +{
>> + int n1;
>> + if ((sector_num + nb_sectors) <= bs->total_sectors) {
>> + return nb_sectors;
>> + }
>> + if (sector_num >= bs->total_sectors) {
>> + n1 = 0;
>> + } else {
>> + n1 = bs->total_sectors - sector_num;
>> + }
>> +
>> + qemu_iovec_memset(qiov, BDRV_SECTOR_SIZE * n1,
>> + 0, BDRV_SECTOR_SIZE * (nb_sectors - n1));
>> +
>> + return n1;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_readv(BlockDriverState *bs,
>> + int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + int cur_nr_sectors;
>> + uint64_t bytes_done = 0;
>> + QEMUIOVector hd_qiov;
>> + int n, n1, ret = 0;
>> +
>> + qemu_iovec_init(&hd_qiov, qiov->niov);
>> + qemu_co_mutex_lock(&s->lock);
>> + while (remaining_sectors != 0) {
>> + cur_nr_sectors = remaining_sectors;
>> + if (add_cow_is_allocated(bs, sector_num, cur_nr_sectors, &n)) {
>> + cur_nr_sectors = n;
>
> One of n and cur_nr_sectors is redundant.
Okay.
>
>> + qemu_iovec_reset(&hd_qiov);
>> + qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> + cur_nr_sectors * BDRV_SECTOR_SIZE);
>> + qemu_co_mutex_unlock(&s->lock);
>> + ret = bdrv_co_readv(s->image_hd, sector_num, n, &hd_qiov);
>> + qemu_co_mutex_lock(&s->lock);
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + } else {
>> + cur_nr_sectors = n;
>> + if (bs->backing_hd) {
>> + qemu_iovec_reset(&hd_qiov);
>> + qemu_iovec_concat(&hd_qiov, qiov, bytes_done,
>> + cur_nr_sectors * BDRV_SECTOR_SIZE);
>> + n1 = add_cow_backing_read(bs->backing_hd, &hd_qiov,
>> + sector_num, cur_nr_sectors);
>> + if (n1 > 0) {
>> + qemu_co_mutex_unlock(&s->lock);
>> + ret = bdrv_co_readv(bs->backing_hd, sector_num,
>> + n, &hd_qiov);
>> + qemu_co_mutex_lock(&s->lock);
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + }
>> + } else {
>> + qemu_iovec_memset(&hd_qiov, 0, 0,
>> + BDRV_SECTOR_SIZE * cur_nr_sectors);
>> + }
>> + }
>> + remaining_sectors -= cur_nr_sectors;
>> + sector_num += cur_nr_sectors;
>> + bytes_done += cur_nr_sectors * BDRV_SECTOR_SIZE;
>> + }
>> +fail:
>> + qemu_co_mutex_unlock(&s->lock);
>> + qemu_iovec_destroy(&hd_qiov);
>> + return ret;
>> +}
>> +
>> +static int coroutine_fn copy_sectors(BlockDriverState *bs,
>> + int n_start, int n_end)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + QEMUIOVector qiov;
>> + struct iovec iov;
>> + int n, ret;
>> +
>> + n = n_end - n_start;
>> + if (n <= 0) {
>> + return 0;
>> + }
>> +
>> + iov.iov_len = n * BDRV_SECTOR_SIZE;
>> + iov.iov_base = qemu_blockalign(bs, iov.iov_len);
>> +
>> + qemu_iovec_init_external(&qiov, &iov, 1);
>> +
>> + ret = bdrv_co_readv(bs->backing_hd, n_start, n, &qiov);
>> + if (ret < 0) {
>> + goto out;
>> + }
>> + ret = bdrv_co_writev(s->image_hd, n_start, n, &qiov);
>> + if (ret < 0) {
>> + goto out;
>> + }
>> +
>> + ret = 0;
>> +out:
>> + qemu_vfree(iov.iov_base);
>> + return ret;
>> +}
>> +
>> +static coroutine_fn int add_cow_co_writev(BlockDriverState *bs,
>> + int64_t sector_num, int remaining_sectors, QEMUIOVector *qiov)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + BlockCache *c = s->bitmap_cache;
>> + int ret = 0, i;
>> + QEMUIOVector hd_qiov;
>> + uint8_t *table;
>> + uint64_t offset;
>> +
>> + qemu_co_mutex_lock(&s->lock);
>> + qemu_iovec_init(&hd_qiov, qiov->niov);
>> + ret = bdrv_co_writev(s->image_hd,
>> + sector_num,
>> + remaining_sectors, qiov);
>> +
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + if ((s->header.features & ADD_COW_F_All_ALLOCATED) == 0) {
>> + /* Copy content of unmodified sectors */
>> + if (!is_cluster_head(sector_num) && !is_allocated(bs, sector_num)) {
>> + ret = copy_sectors(bs, sector_num & ~(SECTORS_PER_CLUSTER - 1),
>> + sector_num);
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + }
>> +
>> + if (!is_cluster_tail(sector_num + remaining_sectors - 1)
>> + && !is_allocated(bs, sector_num + remaining_sectors - 1)) {
>> + ret = copy_sectors(bs, sector_num + remaining_sectors,
>> + ((sector_num + remaining_sectors) | (SECTORS_PER_CLUSTER - 1)) + 1);
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + }
>> +
>> + for (i = sector_num / SECTORS_PER_CLUSTER;
>> + i <= (sector_num + remaining_sectors - 1) / SECTORS_PER_CLUSTER;
>> + i++) {
>> + offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>> + + (offset_in_bitmap(i * SECTORS_PER_CLUSTER) & (~(c->entry_size - 1)));
>
> The maths in this loop looks a bit confusing, but I think it's correct.
>
>> + ret = block_cache_get(bs, s->bitmap_cache, offset,
>> + (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>> + if (ret < 0) {
>> + goto fail;
>> + }
>> + if ((table[i / 8] & (1 << (i % 8))) == 0) {
>> + table[i / 8] |= (1 << (i % 8));
>> + block_cache_entry_mark_dirty(s->bitmap_cache, table);
>> + }
>
> Missing block_cache_put again?
>
>> + }
>> + }
>> + ret = 0;
>> +fail:
>> + qemu_co_mutex_unlock(&s->lock);
>> + qemu_iovec_destroy(&hd_qiov);
>> + return ret;
>> +}
>> +
>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + int sector_per_byte = SECTORS_PER_CLUSTER * 8;
>> + int ret;
>> + uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
>> + int64_t bitmap_size =
>> + (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>> + bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
>> + & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
>> +
>> + ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
>> + if (ret < 0) {
>> + return ret;
>> + }
>> + return 0;
>> +}
>
> So you don't truncate s->image_file? Does this work?
s->image_file should be truncated? Image file can have a larger virtual size
than backing_file, my understanding is we should not truncate image file.
>
>> +
>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>> +{
>> + BDRVAddCowState *s = bs->opaque;
>> + int ret;
>> +
>> + qemu_co_mutex_lock(&s->lock);
>> + ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
>> + ADD_COW_CACHE_ENTRY_SIZE);
>> + qemu_co_mutex_unlock(&s->lock);
>> + return ret;
>> +}
>
> What about flushing s->image_file?
>
>> +
>> +static QEMUOptionParameter add_cow_create_options[] = {
>> + {
>> + .name = BLOCK_OPT_SIZE,
>> + .type = OPT_SIZE,
>> + .help = "Virtual disk size"
>> + },
>> + {
>> + .name = BLOCK_OPT_BACKING_FILE,
>> + .type = OPT_STRING,
>> + .help = "File name of a base image"
>> + },
>> + {
>> + .name = BLOCK_OPT_BACKING_FMT,
>> + .type = OPT_STRING,
>> + .help = "Image format of the base image"
>> + },
>> + {
>> + .name = BLOCK_OPT_IMAGE_FILE,
>> + .type = OPT_STRING,
>> + .help = "File name of a image file"
>> + },
>> + {
>> + .name = BLOCK_OPT_IMAGE_FORMAT,
>> + .type = OPT_STRING,
>> + .help = "Image format of the image file"
>> + },
>> + { NULL }
>> +};
>> +
>> +static BlockDriver bdrv_add_cow = {
>> + .format_name = "add-cow",
>> + .instance_size = sizeof(BDRVAddCowState),
>> + .bdrv_probe = add_cow_probe,
>> + .bdrv_open = add_cow_open,
>> + .bdrv_close = add_cow_close,
>> + .bdrv_create = add_cow_create,
>> + .bdrv_co_readv = add_cow_co_readv,
>> + .bdrv_co_writev = add_cow_co_writev,
>> + .bdrv_truncate = bdrv_add_cow_truncate,
>> + .bdrv_co_is_allocated = add_cow_is_allocated,
>> +
>> + .create_options = add_cow_create_options,
>> + .bdrv_co_flush_to_os = add_cow_co_flush,
>> +};
>> +
>> +static void bdrv_add_cow_init(void)
>> +{
>> + bdrv_register(&bdrv_add_cow);
>> +}
>> +
>> +block_init(bdrv_add_cow_init);
>> diff --git a/block/add-cow.h b/block/add-cow.h
>> new file mode 100644
>> index 0000000..f058376
>> --- /dev/null
>> +++ b/block/add-cow.h
>> @@ -0,0 +1,85 @@
>> +/*
>> + * QEMU ADD-COW Disk Format
>> + *
>> + * Copyright IBM, Corp. 2012
>> + *
>> + * Authors:
>> + * Dong Xu Wang <wdongxu@linux.vnet.ibm.com>
>> + *
>> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
>> + * See the COPYING.LIB file in the top-level directory.
>> + *
>> + */
>> +
>> +#ifndef BLOCK_ADD_COW_H
>> +#define BLOCK_ADD_COW_H
>> +#include "block-cache.h"
>> +
>> +enum {
>> + ADD_COW_F_All_ALLOCATED = 0X01,
>> + ADD_COW_FEATURE_MASK = ADD_COW_F_All_ALLOCATED,
>> +
>> + ADD_COW_MAGIC = (((uint64_t)'A' << 56) | ((uint64_t)'D' << 48) | \
>> + ((uint64_t)'D' << 40) | ((uint64_t)'_' << 32) | \
>> + ((uint64_t)'C' << 24) | ((uint64_t)'O' << 16) | \
>> + ((uint64_t)'W' << 8) | 0xFF),
>> + ADD_COW_VERSION = 1,
>> + ADD_COW_FILE_LEN = 1024,
>> + ADD_COW_CACHE_SIZE = 16,
>> + ADD_COW_CACHE_ENTRY_SIZE = 65536,
>> + ADD_COW_CLUSTER_SIZE = 65536,
>> + SECTORS_PER_CLUSTER = (ADD_COW_CLUSTER_SIZE / BDRV_SECTOR_SIZE),
>> + ADD_COW_PAGE_SIZE = 4096,
>> + ADD_COW_DEFAULT_PAGE_SIZE = 1,
>> +};
>> +
>> +typedef struct AddCowHeader {
>> + uint64_t magic;
>> + uint32_t version;
>> +
>> + uint32_t backing_filename_offset;
>> + uint32_t backing_filename_size;
>> +
>> + uint32_t image_filename_offset;
>> + uint32_t image_filename_size;
>> +
>> + uint64_t features;
>> + uint64_t optional_features;
>> + uint32_t header_pages_size;
>> +} QEMU_PACKED AddCowHeader;
>
> Why aren't backing/image_file_format part of the header here? They are
> in the spec. It would also simplify some offset calculation code.
>
Anthony said "It's far better to shrink the size of the header and use
an offset/len
pointer to the backing file string. Limiting backing files to 1023 is
unacceptable"
http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg04110.html
So I use offset and length instead of using string directly.
>> +
>> +typedef struct BDRVAddCowState {
>> + BlockDriverState *image_hd;
>> + CoMutex lock;
>> + int cluster_size;
>> + BlockCache *bitmap_cache;
>> + uint64_t bitmap_size;
>> + AddCowHeader header;
>> + char backing_file_format[16];
>> + char image_file_format[16];
>> +} BDRVAddCowState;
>> +
>> +/* Convert sector_num to offset in bitmap */
>> +static inline int64_t offset_in_bitmap(int64_t sector_num)
>> +{
>> + int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>> + return cluster_num / 8;
>> +}
>> +
>> +static inline bool is_cluster_head(int64_t sector_num)
>> +{
>> + return sector_num % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +static inline bool is_cluster_tail(int64_t sector_num)
>> +{
>> + return (sector_num + 1) % SECTORS_PER_CLUSTER == 0;
>> +}
>> +
>> +BlockCache *add_cow_cache_create(BlockDriverState *bs, int num_tables);
>> +int add_cow_cache_destroy(BlockDriverState *bs, BlockCache *c);
>> +void add_cow_cache_entry_mark_dirty(BlockCache *c, void *table);
>> +int add_cow_cache_get(BlockDriverState *bs, BlockCache *c, uint64_t offset,
>> + void **table);
>> +int add_cow_cache_flush(BlockDriverState *bs, BlockCache *c);
>
> These functions don't really exist any more, do they?
Right, sorry.
>
> Kevin
>
Thank you, Kevin.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH V12 5/6] add-cow file format
2012-09-12 7:28 ` Dong Xu Wang
@ 2012-09-12 7:50 ` Kevin Wolf
0 siblings, 0 replies; 25+ messages in thread
From: Kevin Wolf @ 2012-09-12 7:50 UTC (permalink / raw)
To: Dong Xu Wang; +Cc: qemu-devel
Am 12.09.2012 09:28, schrieb Dong Xu Wang:
>>> +static bool is_allocated(BlockDriverState *bs, int64_t sector_num)
>>> +{
>>> + BDRVAddCowState *s = bs->opaque;
>>> + BlockCache *c = s->bitmap_cache;
>>> + int64_t cluster_num = sector_num / SECTORS_PER_CLUSTER;
>>> + uint8_t *table = NULL;
>>> + uint64_t offset = ADD_COW_PAGE_SIZE * s->header.header_pages_size
>>> + + (offset_in_bitmap(sector_num) & (~(c->entry_size - 1)));
>>> + int ret = block_cache_get(bs, s->bitmap_cache, offset,
>>> + (void **)&table, BLOCK_TABLE_BITMAP, ADD_COW_CACHE_ENTRY_SIZE);
>>
>> No matching block_cache_put?
>>
>>> +
>>> + if (ret < 0) {
>>> + return ret;
>>> + }
>>> + return table[cluster_num / 8 % ADD_COW_CACHE_ENTRY_SIZE]
>>> + & (1 << (cluster_num % 8));
>>> +}
>>> +
>>> +static coroutine_fn int add_cow_is_allocated(BlockDriverState *bs,
>>> + int64_t sector_num, int nb_sectors, int *num_same)
>>> +{
>>> + BDRVAddCowState *s = bs->opaque;
>>> + int changed;
>>> +
>>> + if (nb_sectors == 0) {
>>> + *num_same = 0;
>>> + return 0;
>>> + }
>>> +
>>> + if (s->header.features & ADD_COW_F_All_ALLOCATED) {
>>> + *num_same = nb_sectors - 1;
>>
>> Why - 1?
>>
>>> + return 1;
>>> + }
>>> + changed = is_allocated(bs, sector_num);
>>> +
>>> + for (*num_same = 1; *num_same < nb_sectors; (*num_same)++) {
>>> + if (is_allocated(bs, sector_num + *num_same) != changed) {
>>> + break;
>>> + }
>>> + }
>>> + return changed;
>>> +}
>>> +static int bdrv_add_cow_truncate(BlockDriverState *bs, int64_t size)
>>> +{
>>> + BDRVAddCowState *s = bs->opaque;
>>> + int sector_per_byte = SECTORS_PER_CLUSTER * 8;
>>> + int ret;
>>> + uint32_t bitmap_pos = s->header.header_pages_size * ADD_COW_PAGE_SIZE;
>>> + int64_t bitmap_size =
>>> + (size / BDRV_SECTOR_SIZE + sector_per_byte - 1) / sector_per_byte;
>>> + bitmap_size = (bitmap_size + ADD_COW_CACHE_ENTRY_SIZE - 1)
>>> + & (~(ADD_COW_CACHE_ENTRY_SIZE - 1));
>>> +
>>> + ret = bdrv_truncate(bs->file, bitmap_pos + bitmap_size);
>>> + if (ret < 0) {
>>> + return ret;
>>> + }
>>> + return 0;
>>> +}
>>
>> So you don't truncate s->image_file? Does this work?
>
> s->image_file should be truncated? Image file can have a larger virtual size
> than backing_file, my understanding is we should not truncate image file.
I'm talking about s->image_hd, not bs->backing_hd. You are right that
the backing file should not be changed. But the associated raw image
should be resized, shouldn't it?
>>> +static coroutine_fn int add_cow_co_flush(BlockDriverState *bs)
>>> +{
>>> + BDRVAddCowState *s = bs->opaque;
>>> + int ret;
>>> +
>>> + qemu_co_mutex_lock(&s->lock);
>>> + ret = block_cache_flush(bs, s->bitmap_cache, BLOCK_TABLE_BITMAP,
>>> + ADD_COW_CACHE_ENTRY_SIZE);
>>> + qemu_co_mutex_unlock(&s->lock);
>>> + return ret;
>>> +}
>>
>> What about flushing s->image_file?
>>> +typedef struct AddCowHeader {
>>> + uint64_t magic;
>>> + uint32_t version;
>>> +
>>> + uint32_t backing_filename_offset;
>>> + uint32_t backing_filename_size;
>>> +
>>> + uint32_t image_filename_offset;
>>> + uint32_t image_filename_size;
>>> +
>>> + uint64_t features;
>>> + uint64_t optional_features;
>>> + uint32_t header_pages_size;
>>> +} QEMU_PACKED AddCowHeader;
>>
>> Why aren't backing/image_file_format part of the header here? They are
>> in the spec. It would also simplify some offset calculation code.
>>
>
> Anthony said "It's far better to shrink the size of the header and use
> an offset/len
> pointer to the backing file string. Limiting backing files to 1023 is
> unacceptable"
>
> http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg04110.html
>
> So I use offset and length instead of using string directly.
I'm talking about the format, not the path.
Kevin
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2012-09-12 7:50 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-10 15:39 [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 1/6] docs: document for " Dong Xu Wang
2012-09-06 17:27 ` Michael Roth
2012-09-10 1:48 ` Dong Xu Wang
2012-09-10 15:23 ` Kevin Wolf
2012-09-11 2:12 ` Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 2/6] make path_has_protocol non-static Dong Xu Wang
2012-09-06 17:27 ` Michael Roth
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 3/6] qed_read_string to bdrv_read_string Dong Xu Wang
2012-09-06 17:32 ` Michael Roth
2012-09-10 1:49 ` Dong Xu Wang
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 4/6] rename qcow2-cache.c to block-cache.c Dong Xu Wang
2012-09-06 17:52 ` Michael Roth
2012-09-10 2:14 ` Dong Xu Wang
2012-09-11 8:41 ` Kevin Wolf
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 5/6] add-cow file format Dong Xu Wang
2012-09-06 20:19 ` Michael Roth
2012-09-10 2:25 ` Dong Xu Wang
2012-09-11 9:44 ` Kevin Wolf
2012-09-11 9:40 ` Kevin Wolf
2012-09-12 7:28 ` Dong Xu Wang
2012-09-12 7:50 ` Kevin Wolf
2012-08-10 15:39 ` [Qemu-devel] [PATCH V12 6/6] add-cow: add qemu-iotests support Dong Xu Wang
2012-09-11 9:55 ` Kevin Wolf
2012-08-23 5:34 ` [Qemu-devel] [PATCH V12 0/6] add-cow file format Dong Xu Wang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).