qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block
@ 2010-02-15 16:19 Kevin Wolf
  2010-02-15 16:19 ` [Qemu-devel] [PATCH 1/3] qcow2: Factor next_refcount_table_size out Kevin Wolf
                   ` (3 more replies)
  0 siblings, 4 replies; 12+ messages in thread
From: Kevin Wolf @ 2010-02-15 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, hch, armbru, gleb, quintela

The current implementation of alloc_refcount_block and grow_refcount_table has
fundamental problems regarding error handling. There are some places where an
I/O error means that the image is going to be corrupted. I have found that the
only way to fix this is to completely rewrite the thing.

Just sending as an RFC to the list hasn't generated a lot of comments (to be
precise, not a single one). This is a critical part of qcow2 and needs reviews.
So let's try it another way: People in CC, please give it a review. Sooner or
later some of you will need to do so anyway.

Kevin Wolf (3):
  qcow2: Factor next_refcount_table_size out
  qcow2: Rewrite alloc_refcount_block/grow_refcount_table
  qcow2: More check for qemu-img check

 block/qcow2-refcount.c |  334 +++++++++++++++++++++++++++++++++++-------------
 1 files changed, 244 insertions(+), 90 deletions(-)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 1/3] qcow2: Factor next_refcount_table_size out
  2010-02-15 16:19 [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block Kevin Wolf
@ 2010-02-15 16:19 ` Kevin Wolf
  2010-02-18 10:40   ` [Qemu-devel] " Juan Quintela
  2010-02-15 16:19 ` [Qemu-devel] [PATCH 2/3] qcow2: Rewrite alloc_refcount_block/grow_refcount_table Kevin Wolf
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2010-02-15 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, hch, armbru, gleb, quintela

When the refcount table grows, it doesn't only grow by one entry but reserves
some space for future refcount blocks. The algorithm to calculate the number of
entries stays the same with the fixes, so factor it out before replacing the
rest.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-refcount.c |   34 +++++++++++++++++++++++-----------
 1 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 2fdc26b..0e2ecd7 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -123,6 +123,28 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
     return be16_to_cpu(s->refcount_block_cache[block_index]);
 }
 
+/*
+ * Rounds the refcount table size up to avoid growing the table for each single
+ * refcount block that is allocated.
+ */
+static unsigned int next_refcount_table_size(BDRVQcowState *s,
+    unsigned int min_size)
+{
+    unsigned int refcount_table_clusters = 0;
+    unsigned int new_table_size = 1;
+
+    while (min_size > new_table_size) {
+        if (refcount_table_clusters == 0) {
+            refcount_table_clusters = 1;
+        } else {
+            refcount_table_clusters = (refcount_table_clusters * 3 + 1) / 2;
+        }
+        new_table_size = refcount_table_clusters << (s->cluster_bits - 3);
+    }
+
+    return new_table_size;
+}
+
 static int grow_refcount_table(BlockDriverState *bs, int min_size)
 {
     BDRVQcowState *s = bs->opaque;
@@ -136,17 +158,7 @@ static int grow_refcount_table(BlockDriverState *bs, int min_size)
     if (min_size <= s->refcount_table_size)
         return 0;
     /* compute new table size */
-    refcount_table_clusters = s->refcount_table_size >> (s->cluster_bits - 3);
-    for(;;) {
-        if (refcount_table_clusters == 0) {
-            refcount_table_clusters = 1;
-        } else {
-            refcount_table_clusters = (refcount_table_clusters * 3 + 1) / 2;
-        }
-        new_table_size = refcount_table_clusters << (s->cluster_bits - 3);
-        if (min_size <= new_table_size)
-            break;
-    }
+    new_table_size = next_refcount_table_size(s, min_size);
 #ifdef DEBUG_ALLOC2
     printf("grow_refcount_table from %d to %d\n",
            s->refcount_table_size,
-- 
1.6.6

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 2/3] qcow2: Rewrite alloc_refcount_block/grow_refcount_table
  2010-02-15 16:19 [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block Kevin Wolf
  2010-02-15 16:19 ` [Qemu-devel] [PATCH 1/3] qcow2: Factor next_refcount_table_size out Kevin Wolf
@ 2010-02-15 16:19 ` Kevin Wolf
  2010-02-18 12:02   ` [Qemu-devel] " Juan Quintela
  2010-02-15 16:19 ` [Qemu-devel] [PATCH 3/3] qcow2: More checks for qemu-img check Kevin Wolf
  2010-02-19 21:13 ` [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block Anthony Liguori
  3 siblings, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2010-02-15 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, hch, armbru, gleb, quintela

The current implementation of alloc_refcount_block and grow_refcount_table has
fundamental problems regarding error handling. There are some places where an
I/O error means that the image is going to be corrupted. I have found that the
only way to fix this is to completely rewrite the thing.

In detail, the problem is that the refcount blocks itself are allocated using
alloc_refcount_noref (to avoid endless recursion when updating the refcount of
the new refcount block, which migh access just the same refcount block but its
allocation is not yet completed...). Only at the end of the refcount allocation
the refcount of the refcount block is increased. If an error happens in
between, the refcount block is in use, but has a refcount of zero and will
likely be overwritten later.

The new approach is explained in comments in the code. The trick is basically
to let new refcount blocks describe their own refcount, so their refcount will
be automatically changed when they are hooked up in the refcount table.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-refcount.c |  306 ++++++++++++++++++++++++++++++++++--------------
 1 files changed, 218 insertions(+), 88 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index 0e2ecd7..e7fdf64 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -27,7 +27,7 @@
 #include "block/qcow2.h"
 
 static int64_t alloc_clusters_noref(BlockDriverState *bs, int64_t size);
-static int update_refcount(BlockDriverState *bs,
+static int QEMU_WARN_UNUSED_RESULT update_refcount(BlockDriverState *bs,
                             int64_t offset, int64_t length,
                             int addend);
 
@@ -145,114 +145,244 @@ static unsigned int next_refcount_table_size(BDRVQcowState *s,
     return new_table_size;
 }
 
-static int grow_refcount_table(BlockDriverState *bs, int min_size)
+
+/* Checks if two offsets are described by the same refcount block */
+static int in_same_refcount_block(BDRVQcowState *s, uint64_t offset_a,
+    uint64_t offset_b)
+{
+    uint64_t block_a = offset_a >> (2 * s->cluster_bits - REFCOUNT_SHIFT);
+    uint64_t block_b = offset_b >> (2 * s->cluster_bits - REFCOUNT_SHIFT);
+
+    return (block_a == block_b);
+}
+
+/*
+ * Loads a refcount block. If it doesn't exist yet, it is allocated first
+ * (including growing the refcount table if needed).
+ *
+ * Returns the offset of the refcount block on success or -errno in error case
+ */
+static int64_t alloc_refcount_block(BlockDriverState *bs, int64_t cluster_index)
 {
     BDRVQcowState *s = bs->opaque;
-    int new_table_size, new_table_size2, refcount_table_clusters, i, ret;
-    uint64_t *new_table;
-    int64_t table_offset;
-    uint8_t data[12];
-    int old_table_size;
-    int64_t old_table_offset;
+    unsigned int refcount_table_index;
+    uint64_t refcount_block_offset;
+    int ret;
+
+    /* Find the refcount block for the given cluster */
+    refcount_table_index = cluster_index >> (s->cluster_bits - REFCOUNT_SHIFT);
+    if (refcount_table_index >= s->refcount_table_size) {
+        refcount_block_offset = 0;
+    } else {
+        refcount_block_offset = s->refcount_table[refcount_table_index];
+    }
+
+    /* If it's already there, we're done */
+    if (refcount_block_offset) {
+        if (refcount_block_offset != s->refcount_block_cache_offset) {
+            ret = load_refcount_block(bs, refcount_block_offset);
+            if (ret < 0) {
+                return ret;
+            }
+        }
+        return refcount_block_offset;
+    }
+
+    /*
+     * If we came here, we need to allocate something. Something is at least
+     * a cluster for the new refcount block. It may also include a new refcount
+     * table if the old refcount table is too small.
+     *
+     * Note that allocating clusters here needs some special care:
+     *
+     * - We can't use the normal qcow2_alloc_clusters(), it would try to
+     *   increase the refcount and very likely we would end up with an endless
+     *   recursion. Instead we must place the refcount blocks in a way that
+     *   they can describe them themselves.
+     *
+     * - We need to consider that at this point we are inside update_refcounts
+     *   and doing the initial refcount increase. This means that some clusters
+     *   have already been allocated by the caller, but their refcount isn't
+     *   accurate yet. free_cluster_index tells us where this allocation ends
+     *   as long as we don't overwrite it by freeing clusters.
+     *
+     * - alloc_clusters_noref and qcow2_free_clusters may load a different
+     *   refcount block into the cache
+     */
+
+    if (cache_refcount_updates) {
+        write_refcount_block(s);
+    }
+
+    /* Allocate the refcount block itself and mark it as used */
+    uint64_t new_block = alloc_clusters_noref(bs, s->cluster_size);
+    memset(s->refcount_block_cache, 0, s->cluster_size);
+    s->refcount_block_cache_offset = new_block;
 
-    if (min_size <= s->refcount_table_size)
-        return 0;
-    /* compute new table size */
-    new_table_size = next_refcount_table_size(s, min_size);
 #ifdef DEBUG_ALLOC2
-    printf("grow_refcount_table from %d to %d\n",
-           s->refcount_table_size,
-           new_table_size);
+    fprintf(stderr, "qcow2: Allocate refcount block %d for %" PRIx64
+        " at %" PRIx64 "\n",
+        refcount_table_index, cluster_index << s->cluster_bits, new_block);
 #endif
-    new_table_size2 = new_table_size * sizeof(uint64_t);
-    new_table = qemu_mallocz(new_table_size2);
+
+    if (in_same_refcount_block(s, new_block, cluster_index << s->cluster_bits)) {
+        /* The block describes itself, need to update the cache */
+        int block_index = (new_block >> s->cluster_bits) &
+            ((1 << (s->cluster_bits - REFCOUNT_SHIFT)) - 1);
+        s->refcount_block_cache[block_index] = cpu_to_be16(1);
+    } else {
+        /* Described somewhere else. This can recurse at most twice before we
+         * arrive at a block that describes itself. */
+        ret = update_refcount(bs, new_block, s->cluster_size, 1);
+        if (ret < 0) {
+            goto fail_block;
+        }
+    }
+
+    /* Now the new refcount block needs to be written to disk */
+    ret = bdrv_pwrite(s->hd, new_block, s->refcount_block_cache,
+        s->cluster_size);
+    if (ret < 0) {
+        goto fail_block;
+    }
+
+    /* If the refcount table is big enough, just hook the block up there */
+    if (refcount_table_index < s->refcount_table_size) {
+        uint64_t data64 = cpu_to_be64(new_block);
+        ret = bdrv_pwrite(s->hd,
+            s->refcount_table_offset + refcount_table_index * sizeof(uint64_t),
+            &data64, sizeof(data64));
+        if (ret < 0) {
+            goto fail_block;
+        }
+
+        s->refcount_table[refcount_table_index] = new_block;
+        return new_block;
+    }
+
+    /*
+     * If we come here, we need to grow the refcount table. Again, a new
+     * refcount table needs some space and we can't simply allocate to avoid
+     * endless recursion.
+     *
+     * Therefore let's grab new refcount blocks at the end of the image, which
+     * will describe themselves and the new refcount table. This way we can
+     * reference them only in the new table and do the switch to the new
+     * refcount table at once without producing an inconsistent state in
+     * between.
+     */
+    /* Calculate the number of refcount blocks needed so far */
+    uint64_t refcount_block_clusters = 1 << (s->cluster_bits - REFCOUNT_SHIFT);
+    uint64_t blocks_used = (s->free_cluster_index +
+        refcount_block_clusters - 1) / refcount_block_clusters;
+
+    /* And now we need at least one block more for the new metadata */
+    uint64_t table_size = next_refcount_table_size(s, blocks_used + 1);
+    uint64_t last_table_size = table_size;
+    uint64_t blocks_clusters;
+    do {
+        uint64_t table_clusters = size_to_clusters(s, table_size);
+        blocks_clusters = 1 +
+            ((table_clusters + refcount_block_clusters - 1)
+            / refcount_block_clusters);
+        uint64_t meta_clusters = table_clusters + blocks_clusters;
+
+        table_size = next_refcount_table_size(s, blocks_used +
+            ((meta_clusters + refcount_block_clusters - 1)
+            / refcount_block_clusters));
+
+    } while (last_table_size != table_size);
+
+#ifdef DEBUG_ALLOC2
+    fprintf(stderr, "qcow2: Grow refcount table %" PRId32 " => %" PRId64 "\n",
+        s->refcount_table_size, table_size);
+#endif
+
+    /* Create the new refcount table and blocks */
+    uint64_t meta_offset = (blocks_used * refcount_block_clusters) *
+        s->cluster_size;
+    uint64_t table_offset = meta_offset + blocks_clusters * s->cluster_size;
+    uint16_t *new_blocks = qemu_mallocz(blocks_clusters * s->cluster_size);
+    uint64_t *new_table = qemu_mallocz(table_size * sizeof(uint64_t));
+
+    assert(meta_offset >= (s->free_cluster_index * s->cluster_size));
+
+    /* Fill the new refcount table */
     memcpy(new_table, s->refcount_table,
-           s->refcount_table_size * sizeof(uint64_t));
-    for(i = 0; i < s->refcount_table_size; i++)
+        s->refcount_table_size * sizeof(uint64_t));
+    new_table[refcount_table_index] = new_block;
+
+    int i;
+    for (i = 0; i < blocks_clusters; i++) {
+        new_table[blocks_used + i] = meta_offset + (i * s->cluster_size);
+    }
+
+    /* Fill the refcount blocks */
+    uint64_t table_clusters = size_to_clusters(s, table_size * sizeof(uint64_t));
+    int block = 0;
+    for (i = 0; i < table_clusters + blocks_clusters; i++) {
+        new_blocks[block++] = cpu_to_be16(1);
+    }
+
+    /* Write refcount blocks to disk */
+    ret = bdrv_pwrite(s->hd, meta_offset, new_blocks,
+        blocks_clusters * s->cluster_size);
+    qemu_free(new_blocks);
+    if (ret < 0) {
+        goto fail_table;
+    }
+
+    /* Write refcount table to disk */
+    for(i = 0; i < table_size; i++) {
         cpu_to_be64s(&new_table[i]);
-    /* Note: we cannot update the refcount now to avoid recursion */
-    table_offset = alloc_clusters_noref(bs, new_table_size2);
-    ret = bdrv_pwrite(s->hd, table_offset, new_table, new_table_size2);
-    if (ret != new_table_size2)
-        goto fail;
-    for(i = 0; i < s->refcount_table_size; i++)
-        be64_to_cpus(&new_table[i]);
+    }
+
+    ret = bdrv_pwrite(s->hd, table_offset, new_table,
+        table_size * sizeof(uint64_t));
+    if (ret < 0) {
+        goto fail_table;
+    }
 
+    for(i = 0; i < table_size; i++) {
+        cpu_to_be64s(&new_table[i]);
+    }
+
+    /* Hook up the new refcount table in the qcow2 header */
+    uint8_t data[12];
     cpu_to_be64w((uint64_t*)data, table_offset);
-    cpu_to_be32w((uint32_t*)(data + 8), refcount_table_clusters);
+    cpu_to_be32w((uint32_t*)(data + 8), table_clusters);
     ret = bdrv_pwrite(s->hd, offsetof(QCowHeader, refcount_table_offset),
-                    data, sizeof(data));
-    if (ret != sizeof(data)) {
-        goto fail;
+        data, sizeof(data));
+    if (ret < 0) {
+        goto fail_table;
     }
 
+    /* And switch it in memory */
+    uint64_t old_table_offset = s->refcount_table_offset;
+    uint64_t old_table_size = s->refcount_table_size;
+
     qemu_free(s->refcount_table);
-    old_table_offset = s->refcount_table_offset;
-    old_table_size = s->refcount_table_size;
     s->refcount_table = new_table;
-    s->refcount_table_size = new_table_size;
+    s->refcount_table_size = table_size;
     s->refcount_table_offset = table_offset;
 
-    update_refcount(bs, table_offset, new_table_size2, 1);
+    /* Free old table. Remember, we must not change free_cluster_index */
+    uint64_t old_free_cluster_index = s->free_cluster_index;
     qcow2_free_clusters(bs, old_table_offset, old_table_size * sizeof(uint64_t));
-    return 0;
- fail:
-    qemu_free(new_table);
-    return ret < 0 ? ret : -EIO;
-}
-
-
-static int64_t alloc_refcount_block(BlockDriverState *bs, int64_t cluster_index)
-{
-    BDRVQcowState *s = bs->opaque;
-    int64_t offset, refcount_block_offset;
-    unsigned int refcount_table_index;
-    int ret;
-    uint64_t data64;
-    int cache = cache_refcount_updates;
+    s->free_cluster_index = old_free_cluster_index;
 
-    /* Find L1 index and grow refcount table if needed */
-    refcount_table_index = cluster_index >> (s->cluster_bits - REFCOUNT_SHIFT);
-    if (refcount_table_index >= s->refcount_table_size) {
-        ret = grow_refcount_table(bs, refcount_table_index + 1);
-        if (ret < 0)
-            return ret;
+    ret = load_refcount_block(bs, new_block);
+    if (ret < 0) {
+        goto fail_block;
     }
 
-    /* Load or allocate the refcount block */
-    refcount_block_offset = s->refcount_table[refcount_table_index];
-    if (!refcount_block_offset) {
-        if (cache_refcount_updates) {
-            write_refcount_block(s);
-            cache_refcount_updates = 0;
-        }
-        /* create a new refcount block */
-        /* Note: we cannot update the refcount now to avoid recursion */
-        offset = alloc_clusters_noref(bs, s->cluster_size);
-        memset(s->refcount_block_cache, 0, s->cluster_size);
-        ret = bdrv_pwrite(s->hd, offset, s->refcount_block_cache, s->cluster_size);
-        if (ret != s->cluster_size)
-            return -EINVAL;
-        s->refcount_table[refcount_table_index] = offset;
-        data64 = cpu_to_be64(offset);
-        ret = bdrv_pwrite(s->hd, s->refcount_table_offset +
-                          refcount_table_index * sizeof(uint64_t),
-                          &data64, sizeof(data64));
-        if (ret != sizeof(data64))
-            return -EINVAL;
-
-        refcount_block_offset = offset;
-        s->refcount_block_cache_offset = offset;
-        update_refcount(bs, offset, s->cluster_size, 1);
-        cache_refcount_updates = cache;
-    } else {
-        if (refcount_block_offset != s->refcount_block_cache_offset) {
-            if (load_refcount_block(bs, refcount_block_offset) < 0)
-                return -EIO;
-        }
-    }
+    return new_block;
 
-    return refcount_block_offset;
+fail_table:
+    qemu_free(new_table);
+fail_block:
+    s->refcount_block_cache_offset = 0;
+    return ret;
 }
 
 #define REFCOUNTS_PER_SECTOR (512 >> REFCOUNT_SHIFT)
-- 
1.6.6

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] [PATCH 3/3] qcow2: More checks for qemu-img check
  2010-02-15 16:19 [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block Kevin Wolf
  2010-02-15 16:19 ` [Qemu-devel] [PATCH 1/3] qcow2: Factor next_refcount_table_size out Kevin Wolf
  2010-02-15 16:19 ` [Qemu-devel] [PATCH 2/3] qcow2: Rewrite alloc_refcount_block/grow_refcount_table Kevin Wolf
@ 2010-02-15 16:19 ` Kevin Wolf
  2010-02-18 12:11   ` [Qemu-devel] " Juan Quintela
  2010-02-19 21:13 ` [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block Anthony Liguori
  3 siblings, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2010-02-15 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: kwolf, hch, armbru, gleb, quintela

Implement some more refcount block related checks

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 block/qcow2-refcount.c |   12 ++++++++++++
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
index e7fdf64..e50fb2a 100644
--- a/block/qcow2-refcount.c
+++ b/block/qcow2-refcount.c
@@ -1065,9 +1065,21 @@ int qcow2_check_refcounts(BlockDriverState *bs)
     for(i = 0; i < s->refcount_table_size; i++) {
         int64_t offset;
         offset = s->refcount_table[i];
+
+        /* Refcount blocks are cluster aligned */
+        if (offset & (s->cluster_size - 1)) {
+            fprintf(stderr, "ERROR refcount block %d is not "
+                "cluster aligned; refcount table entry corrupted\n", i);
+            errors++;
+        }
+
         if (offset != 0) {
             errors += inc_refcounts(bs, refcount_table, nb_clusters,
                           offset, s->cluster_size);
+            if (refcount_table[offset / s->cluster_size] != 1) {
+                fprintf(stderr, "ERROR refcount block %d refcount=%d\n",
+                    i, refcount_table[offset / s->cluster_size]);
+            }
         }
     }
 
-- 
1.6.6

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [Qemu-devel] Re: [PATCH 1/3] qcow2: Factor next_refcount_table_size out
  2010-02-15 16:19 ` [Qemu-devel] [PATCH 1/3] qcow2: Factor next_refcount_table_size out Kevin Wolf
@ 2010-02-18 10:40   ` Juan Quintela
  0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2010-02-18 10:40 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: hch, qemu-devel, gleb, armbru

Kevin Wolf <kwolf@redhat.com> wrote:
> When the refcount table grows, it doesn't only grow by one entry but reserves
> some space for future refcount blocks. The algorithm to calculate the number of
> entries stays the same with the fixes, so factor it out before replacing the
> rest.
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
> ---
>  block/qcow2-refcount.c |   34 +++++++++++++++++++++++-----------
>  1 files changed, 23 insertions(+), 11 deletions(-)
>
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 2fdc26b..0e2ecd7 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -123,6 +123,28 @@ static int get_refcount(BlockDriverState *bs, int64_t cluster_index)
>      return be16_to_cpu(s->refcount_block_cache[block_index]);
>  }
>  
> +/*
> + * Rounds the refcount table size up to avoid growing the table for each single
> + * refcount block that is allocated.
> + */
> +static unsigned int next_refcount_table_size(BDRVQcowState *s,
> +    unsigned int min_size)
> +{
> +    unsigned int refcount_table_clusters = 0;
> +    unsigned int new_table_size = 1;
> +
> +    while (min_size > new_table_size) {
> +        if (refcount_table_clusters == 0) {
> +            refcount_table_clusters = 1;
> +        } else {
> +            refcount_table_clusters = (refcount_table_clusters * 3 + 1) / 2;
> +        }
> +        new_table_size = refcount_table_clusters << (s->cluster_bits - 3);
> +    }
> +
> +    return new_table_size;
> +}
> +


couldn't something like:

static unsigned int next_refcount_table_size(BDRVQcowState *s,
                                             unsigned int min_size)
{
    unsigned int refcount_table_clusters = 1;
    unsigned int min_clusters = (min_size >> (s->clusters_bits -3)) + 1;

    while (min_clusters > refcount_table_clusters) {
            refcount_table_clusters = (refcount_table_clusters * 3 + 1) / 2;
    }

    return refconut_table_clusters << (s->cluster_bits - 3);
}

That removes the if from the inside loop, and makes the comparison in
clusters not in size.  What do you think?

>  static int grow_refcount_table(BlockDriverState *bs, int min_size)
>  {
>      BDRVQcowState *s = bs->opaque;
> @@ -136,17 +158,7 @@ static int grow_refcount_table(BlockDriverState *bs, int min_size)
>      if (min_size <= s->refcount_table_size)
>          return 0;
>      /* compute new table size */
> -    refcount_table_clusters = s->refcount_table_size >> (s->cluster_bits - 3);
> -    for(;;) {
> -        if (refcount_table_clusters == 0) {
> -            refcount_table_clusters = 1;
> -        } else {
> -            refcount_table_clusters = (refcount_table_clusters * 3 + 1) / 2;
> -        }
> -        new_table_size = refcount_table_clusters << (s->cluster_bits - 3);
> -        if (min_size <= new_table_size)
> -            break;
> -    }
> +    new_table_size = next_refcount_table_size(s, min_size);
>  #ifdef DEBUG_ALLOC2
>      printf("grow_refcount_table from %d to %d\n",
>             s->refcount_table_size,

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] Re: [PATCH 2/3] qcow2: Rewrite alloc_refcount_block/grow_refcount_table
  2010-02-15 16:19 ` [Qemu-devel] [PATCH 2/3] qcow2: Rewrite alloc_refcount_block/grow_refcount_table Kevin Wolf
@ 2010-02-18 12:02   ` Juan Quintela
  0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2010-02-18 12:02 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: hch, qemu-devel, gleb, armbru

Kevin Wolf <kwolf@redhat.com> wrote:

> +    /* Find the refcount block for the given cluster */
> +    refcount_table_index = cluster_index >> (s->cluster_bits - REFCOUNT_SHIFT);
> +    if (refcount_table_index >= s->refcount_table_size) {
> +        refcount_block_offset = 0;
> +    } else {
> +        refcount_block_offset = s->refcount_table[refcount_table_index];
> +    }
> +
> +    /* If it's already there, we're done */
> +    if (refcount_block_offset) {
> +        if (refcount_block_offset != s->refcount_block_cache_offset) {
> +            ret = load_refcount_block(bs, refcount_block_offset);
> +            if (ret < 0) {
> +                return ret;
> +            }
> +        }
> +        return refcount_block_offset;
> +    }

I would merge  this if in the previous one.  as a bonus,
refcount_block_offset can be local to that if branch and no need of the
else one.

> +
> +    /*
> +     * If we came here, we need to allocate something. Something is at least
> +     * a cluster for the new refcount block. It may also include a new refcount
> +     * table if the old refcount table is too small.
> +     *
> +     * Note that allocating clusters here needs some special care:
> +     *
> +     * - We can't use the normal qcow2_alloc_clusters(), it would try to
> +     *   increase the refcount and very likely we would end up with an endless
> +     *   recursion. Instead we must place the refcount blocks in a way that
> +     *   they can describe them themselves.
> +     *
> +     * - We need to consider that at this point we are inside update_refcounts
> +     *   and doing the initial refcount increase. This means that some clusters
> +     *   have already been allocated by the caller, but their refcount isn't
> +     *   accurate yet. free_cluster_index tells us where this allocation ends
> +     *   as long as we don't overwrite it by freeing clusters.
> +     *
> +     * - alloc_clusters_noref and qcow2_free_clusters may load a different
> +     *   refcount block into the cache
> +     */
> +
> +    if (cache_refcount_updates) {
> +        write_refcount_block(s);

write_refconut_blocks() can return -EIO.  It is not handled anywhere
else either.


> +    /* Calculate the number of refcount blocks needed so far */
> +    uint64_t refcount_block_clusters = 1 << (s->cluster_bits - REFCOUNT_SHIFT);
> +    uint64_t blocks_used = (s->free_cluster_index +
> +        refcount_block_clusters - 1) / refcount_block_clusters;
> +
> +    /* And now we need at least one block more for the new metadata */
> +    uint64_t table_size = next_refcount_table_size(s, blocks_used + 1);
> +    uint64_t last_table_size = table_size;

only place where last_table_size is assigned to.

> +    uint64_t blocks_clusters;
> +    do {
> +        uint64_t table_clusters = size_to_clusters(s, table_size);
> +        blocks_clusters = 1 +
> +            ((table_clusters + refcount_block_clusters - 1)
> +            / refcount_block_clusters);
> +        uint64_t meta_clusters = table_clusters + blocks_clusters;
> +
> +        table_size = next_refcount_table_size(s, blocks_used +
> +            ((meta_clusters + refcount_block_clusters - 1)
> +            / refcount_block_clusters));

this value can be the same than previous next_refcount_table_size()
call or bigger.

> +
> +    } while (last_table_size != table_size);

how can this converge? (already discussed on irc with keving that a
last_table_size = table_size is missing somewhere in the loop.

Later, Juan.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] Re: [PATCH 3/3] qcow2: More checks for qemu-img check
  2010-02-15 16:19 ` [Qemu-devel] [PATCH 3/3] qcow2: More checks for qemu-img check Kevin Wolf
@ 2010-02-18 12:11   ` Juan Quintela
  0 siblings, 0 replies; 12+ messages in thread
From: Juan Quintela @ 2010-02-18 12:11 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: hch, qemu-devel, gleb, armbru

Kevin Wolf <kwolf@redhat.com> wrote:
> Implement some more refcount block related checks
>
> Signed-off-by: Kevin Wolf <kwolf@redhat.com>

Reviewed-by: Juan Quintela <quintela@redhat.com>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block
  2010-02-15 16:19 [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block Kevin Wolf
                   ` (2 preceding siblings ...)
  2010-02-15 16:19 ` [Qemu-devel] [PATCH 3/3] qcow2: More checks for qemu-img check Kevin Wolf
@ 2010-02-19 21:13 ` Anthony Liguori
  2010-02-20  1:49   ` [Qemu-devel] " Juan Quintela
  3 siblings, 1 reply; 12+ messages in thread
From: Anthony Liguori @ 2010-02-19 21:13 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: armbru, quintela, qemu-devel, gleb, hch

On 02/15/2010 10:19 AM, Kevin Wolf wrote:
> The current implementation of alloc_refcount_block and grow_refcount_table has
> fundamental problems regarding error handling. There are some places where an
> I/O error means that the image is going to be corrupted. I have found that the
> only way to fix this is to completely rewrite the thing.
>
> Just sending as an RFC to the list hasn't generated a lot of comments (to be
> precise, not a single one). This is a critical part of qcow2 and needs reviews.
> So let's try it another way: People in CC, please give it a review. Sooner or
> later some of you will need to do so anyway.
>    

Should I apply this series?  I still don't see any review comments.

Regards,

Anthony Liguori

> Kevin Wolf (3):
>    qcow2: Factor next_refcount_table_size out
>    qcow2: Rewrite alloc_refcount_block/grow_refcount_table
>    qcow2: More check for qemu-img check
>
>   block/qcow2-refcount.c |  334 +++++++++++++++++++++++++++++++++++-------------
>   1 files changed, 244 insertions(+), 90 deletions(-)
>
>
>
>
>    

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] Re: [PATCH 0/3] qcow2: Rewrite alloc_refcount_block
  2010-02-19 21:13 ` [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block Anthony Liguori
@ 2010-02-20  1:49   ` Juan Quintela
  2010-02-20 17:02     ` Anthony Liguori
  2010-02-22  8:54     ` Kevin Wolf
  0 siblings, 2 replies; 12+ messages in thread
From: Juan Quintela @ 2010-02-20  1:49 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Kevin Wolf, armbru, qemu-devel, gleb, hch

Anthony Liguori <anthony@codemonkey.ws> wrote:
> On 02/15/2010 10:19 AM, Kevin Wolf wrote:
>> The current implementation of alloc_refcount_block and grow_refcount_table has
>> fundamental problems regarding error handling. There are some places where an
>> I/O error means that the image is going to be corrupted. I have found that the
>> only way to fix this is to completely rewrite the thing.
>>
>> Just sending as an RFC to the list hasn't generated a lot of comments (to be
>> precise, not a single one). This is a critical part of qcow2 and needs reviews.
>> So let's try it another way: People in CC, please give it a review. Sooner or
>> later some of you will need to do so anyway.
>>    
>
> Should I apply this series?  I still don't see any review comments.

I sent review comments on the 18th.  I expect Kevin to address them.  I
talked with him on irc while doing the review (i.e. he knows about
them).

Kevin???

Later, Juan.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 0/3] qcow2: Rewrite alloc_refcount_block
  2010-02-20  1:49   ` [Qemu-devel] " Juan Quintela
@ 2010-02-20 17:02     ` Anthony Liguori
  2010-02-22  8:54     ` Kevin Wolf
  1 sibling, 0 replies; 12+ messages in thread
From: Anthony Liguori @ 2010-02-20 17:02 UTC (permalink / raw)
  To: Juan Quintela; +Cc: Kevin Wolf, hch, armbru, gleb, qemu-devel

On 02/19/2010 07:49 PM, Juan Quintela wrote:
> Anthony Liguori<anthony@codemonkey.ws>  wrote:
>    
>> On 02/15/2010 10:19 AM, Kevin Wolf wrote:
>>      
>>> The current implementation of alloc_refcount_block and grow_refcount_table has
>>> fundamental problems regarding error handling. There are some places where an
>>> I/O error means that the image is going to be corrupted. I have found that the
>>> only way to fix this is to completely rewrite the thing.
>>>
>>> Just sending as an RFC to the list hasn't generated a lot of comments (to be
>>> precise, not a single one). This is a critical part of qcow2 and needs reviews.
>>> So let's try it another way: People in CC, please give it a review. Sooner or
>>> later some of you will need to do so anyway.
>>>
>>>        
>> Should I apply this series?  I still don't see any review comments.
>>      
> I sent review comments on the 18th.  I expect Kevin to address them.  I
> talked with him on irc while doing the review (i.e. he knows about
> them).
>    

You're right, I missed it.  Thanks.

Regards,

Anthony Liguori

> Kevin???
>
> Later, Juan.
>
>
>
>    

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Qemu-devel] Re: [PATCH 0/3] qcow2: Rewrite alloc_refcount_block
  2010-02-20  1:49   ` [Qemu-devel] " Juan Quintela
  2010-02-20 17:02     ` Anthony Liguori
@ 2010-02-22  8:54     ` Kevin Wolf
  2010-02-22  9:55       ` Markus Armbruster
  1 sibling, 1 reply; 12+ messages in thread
From: Kevin Wolf @ 2010-02-22  8:54 UTC (permalink / raw)
  To: Juan Quintela; +Cc: armbru, gleb, qemu-devel, hch

Am 20.02.2010 02:49, schrieb Juan Quintela:
> Anthony Liguori <anthony@codemonkey.ws> wrote:
>> On 02/15/2010 10:19 AM, Kevin Wolf wrote:
>>> The current implementation of alloc_refcount_block and grow_refcount_table has
>>> fundamental problems regarding error handling. There are some places where an
>>> I/O error means that the image is going to be corrupted. I have found that the
>>> only way to fix this is to completely rewrite the thing.
>>>
>>> Just sending as an RFC to the list hasn't generated a lot of comments (to be
>>> precise, not a single one). This is a critical part of qcow2 and needs reviews.
>>> So let's try it another way: People in CC, please give it a review. Sooner or
>>> later some of you will need to do so anyway.
>>>    
>>
>> Should I apply this series?  I still don't see any review comments.
> 
> I sent review comments on the 18th.  I expect Kevin to address them.  I
> talked with him on irc while doing the review (i.e. he knows about
> them).

I was hoping for more comments as there were three more people in the CC
list. But looks like they prefer reviewing the patches downstream. I'm
going to send a v2 which addresses only your comments if I don't get any
more comments soon.

Kevin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [Qemu-devel] Re: [PATCH 0/3] qcow2: Rewrite alloc_refcount_block
  2010-02-22  8:54     ` Kevin Wolf
@ 2010-02-22  9:55       ` Markus Armbruster
  0 siblings, 0 replies; 12+ messages in thread
From: Markus Armbruster @ 2010-02-22  9:55 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: hch, qemu-devel, gleb, Juan Quintela

Kevin Wolf <kwolf@redhat.com> writes:

> Am 20.02.2010 02:49, schrieb Juan Quintela:
>> Anthony Liguori <anthony@codemonkey.ws> wrote:
>>> On 02/15/2010 10:19 AM, Kevin Wolf wrote:
>>>> The current implementation of alloc_refcount_block and grow_refcount_table has
>>>> fundamental problems regarding error handling. There are some places where an
>>>> I/O error means that the image is going to be corrupted. I have found that the
>>>> only way to fix this is to completely rewrite the thing.
>>>>
>>>> Just sending as an RFC to the list hasn't generated a lot of comments (to be
>>>> precise, not a single one). This is a critical part of qcow2 and needs reviews.
>>>> So let's try it another way: People in CC, please give it a review. Sooner or
>>>> later some of you will need to do so anyway.
>>>>    
>>>
>>> Should I apply this series?  I still don't see any review comments.
>> 
>> I sent review comments on the 18th.  I expect Kevin to address them.  I
>> talked with him on irc while doing the review (i.e. he knows about
>> them).
>
> I was hoping for more comments as there were three more people in the CC
> list. But looks like they prefer reviewing the patches downstream. I'm
> going to send a v2 which addresses only your comments if I don't get any
> more comments soon.

I don't prefer reviewing downstream at all, I just haven't been able to
find the time for a review.  Sorry.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-02-22  9:59 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-02-15 16:19 [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block Kevin Wolf
2010-02-15 16:19 ` [Qemu-devel] [PATCH 1/3] qcow2: Factor next_refcount_table_size out Kevin Wolf
2010-02-18 10:40   ` [Qemu-devel] " Juan Quintela
2010-02-15 16:19 ` [Qemu-devel] [PATCH 2/3] qcow2: Rewrite alloc_refcount_block/grow_refcount_table Kevin Wolf
2010-02-18 12:02   ` [Qemu-devel] " Juan Quintela
2010-02-15 16:19 ` [Qemu-devel] [PATCH 3/3] qcow2: More checks for qemu-img check Kevin Wolf
2010-02-18 12:11   ` [Qemu-devel] " Juan Quintela
2010-02-19 21:13 ` [Qemu-devel] [PATCH 0/3] qcow2: Rewrite alloc_refcount_block Anthony Liguori
2010-02-20  1:49   ` [Qemu-devel] " Juan Quintela
2010-02-20 17:02     ` Anthony Liguori
2010-02-22  8:54     ` Kevin Wolf
2010-02-22  9:55       ` Markus Armbruster

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).