Re: [Qemu-devel] [PATCH v4 07/10] qcow2: Rebuild refcount structure during check

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Max Reitz <mreitz@redhat.com>
To: "Benoît Canet" <benoit.canet@nodalink.com>
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v4 07/10] qcow2: Rebuild refcount structure during check
Date: Fri, 29 Aug 2014 21:25:18 +0200	[thread overview]
Message-ID: <5400D39E.9060107@redhat.com> (raw)
In-Reply-To: <20140828160852.GA31509@nodalink.com>

On 28.08.2014 18:08, Benoît Canet wrote:
> On Wed, Aug 27, 2014 at 10:18:23PM +0200, Max Reitz wrote:
>> The previous commit introduced the "rebuild" variable to qcow2's
>> implementation of the image consistency check. Now make use of this by
>> adding a function which creates a completely new refcount structure
>> based solely on the in-memory information gathered before.
>>
>> The old refcount structure will be leaked, however.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 287 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 284 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 6300cec..2dca6d4 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1603,6 +1603,267 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>   }
>>   
>>   /*
>> + * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to
>> + * the on-disk refcount structures.
>> + *
>> + * *first_free_cluster does not necessarily point to the first free cluster, but
>> + * may point to one cluster as close as possible before it. The offset returned
>> + * will never be before that cluster.
>> + *
>> + * Note that *first_free_cluster is a cluster index whereas the return value is
>> + * an offset.
>> + */
>> +static int64_t alloc_clusters_imrt(BlockDriverState *bs,
>> +                                   int cluster_count,
>> +                                   uint16_t **refcount_table,
>> +                                   int64_t *nb_clusters,
>> +                                   int64_t *first_free_cluster)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t cluster = *first_free_cluster, i;
>> +    bool first_gap = true;
>> +    int contiguous_free_clusters;
>> +
>> +    /* Starting at *first_free_cluster, find a range of at least cluster_count
>> +     * continuously free clusters */
>> +    for (contiguous_free_clusters = 0;
>> +         cluster < *nb_clusters && contiguous_free_clusters < cluster_count;
>> +         cluster++)
>> +    {
>> +        if (!(*refcount_table)[cluster]) {
>> +            contiguous_free_clusters++;
>> +            if (first_gap) {
>> +                /* If this is the first free cluster found, update
>> +                 * *first_free_cluster accordingly */
>> +                *first_free_cluster = cluster;
>> +                first_gap = false;
>> +            }
>> +        } else if (contiguous_free_clusters) {
>> +            contiguous_free_clusters = 0;
>> +        }
>> +    }
>> +
>> +    /* If contiguous_free_clusters is greater than zero, it contains the number
>> +     * of continuously free clusters until the current cluster; the first free
>> +     * cluster in the current "gap" is therefore
>> +     * cluster - contiguous_free_clusters */
>> +
>> +    /* If no such range could be found, grow the in-memory refcount table
>> +     * accordingly to append free clusters at the end of the image */
>> +    if (contiguous_free_clusters < cluster_count) {
>> +        int64_t old_nb_clusters = *nb_clusters;
>> +
>> +        /* There already is a gap of contiguous_free_clusters; we need
>> +         * cluster_count clusters; therefore, we have to allocate
>> +         * cluster_count - contiguous_free_clusters new clusters at the end of
>> +         * the image (which is the current value of cluster; note that cluster
>> +         * may exceed old_nb_clusters if *first_free_cluster pointed beyond the
>> +         * image end) */
>> +        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
>> +        *refcount_table = g_try_realloc(*refcount_table,
>> +                                        *nb_clusters * sizeof(uint16_t));
>> +        if (!*refcount_table) {
>> +            return -ENOMEM;
>> +        }
>> +
>> +        memset(*refcount_table + old_nb_clusters, 0,
>> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
>> +    }
>> +
>> +    /* Go back to the first free cluster */
>> +    cluster -= contiguous_free_clusters;
>> +    for (i = 0; i < cluster_count; i++) {
>> +        (*refcount_table)[cluster + i] = 1;
>> +    }
>> +
>> +    return cluster << s->cluster_bits;
>> +}
>> +
>> +/*
>> + * Creates a new refcount structure based solely on the in-memory information
>> + * given through *refcount_table. All necessary allocations will be reflected
>> + * in that array.
>> + *
>> + * On success, the old refcount structure is leaked (it will be covered by the
>> + * new refcount structure).
>> + */
>> +static int rebuild_refcount_structure(BlockDriverState *bs,
>> +                                      BdrvCheckResult *res,
>> +                                      uint16_t **refcount_table,
>> +                                      int64_t *nb_clusters)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
>> +    int64_t rb_ofs, rb_start, rb_index;
>> +    uint32_t reftable_size = 0;
>> +    uint64_t *reftable = NULL;
>> +    uint16_t *on_disk_rb;
>> +    uint8_t rt_offset_and_clusters[sizeof(uint64_t) + sizeof(uint32_t)];
> Could this be a small struct ?

Yes, it most certainly could, but considering there are other places 
which use this form already (block/qcow2-refcount.c, line 406 (on 
8b30301) or block/qcow2-cluster.c, line 40), I think it's not worth it.

>> +    int i, ret = 0;
>> +
>> +    qcow2_cache_empty(bs, s->refcount_block_cache);
>> +
>> +write_refblocks:
>> +    for (; cluster < *nb_clusters; cluster++) {
>> +        if (!(*refcount_table)[cluster]) {
>> +            continue;
>> +        }
>> +
>> +        rb_index = cluster >> (s->cluster_bits - 1);
>> +        rb_start = rb_index << (s->cluster_bits - 1);
> Do we have an hardcoded hidden relationship to s->refcount_order / sizeof(uint16_t) here ?

You're right, we have. I guess I'll just add a new field to 
BDRVQcowState named "refblock_shift" or something like that.

>> +
>> +        /* Don't allocate a cluster in a refblock already written to disk */
>> +        if (first_free_cluster < rb_start) {
>> +            first_free_cluster = rb_start;
>> +        }
>> +        rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters,
>> +                                     &first_free_cluster);
>> +        if (rb_ofs < 0) {
>> +            fprintf(stderr, "ERROR allocating refblock: %s\n", strerror(-ret));
>> +            res->check_errors++;
>> +            ret = rb_ofs;
>> +            goto fail;
>> +        }
>> +
>> +        if (reftable_size <= rb_index) {
>> +            uint32_t old_rt_size = reftable_size;
>> +            reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t),
>> +                                     s->cluster_size) / sizeof(uint64_t);
>> +            reftable = g_try_realloc(reftable,
>> +                                     reftable_size * sizeof(uint64_t));
>> +            if (!reftable) {
>> +                res->check_errors++;
>> +                ret = -ENOMEM;
>> +                goto fail;
>> +            }
>> +
>> +            memset(reftable + old_rt_size, 0,
>> +                   (reftable_size - old_rt_size) * sizeof(uint64_t));
>> +
>> +            /* The offset we have for the reftable is now no longer valid;
>> +             * this will leak that range, but we can easily fix that by running
>> +             * a leak-fixing check after this rebuild operation */
>> +            rt_ofs = -1;
>> +        }
>> +        reftable[rb_index] = rb_ofs;
>> +
>> +        /* If this is apparently the last refblock (for now), try to squeeze the
>> +         * reftable in; 1 << (s->cluster_bits - MAX(0, s->refcount_order - 3))
>> +         * is the number of entries per refblock (1 << s->refcount_order is the
>> +         * number of bits per refblock entry, but we need the number of bytes);
>> +         * therefore, (*nb_clusters - 1) >> ... is the index of the last
>> +         * refblock */
>> +        if (rb_index == (*nb_clusters - 1) >>
>> +            (s->cluster_bits - MAX(0, s->refcount_order - 3)) && rt_ofs < 0)
>> +        {
>> +            rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
>> +                                                              sizeof(uint64_t)),
>> +                                         refcount_table, nb_clusters,
>> +                                         &first_free_cluster);
>> +            if (rt_ofs < 0) {
>> +                fprintf(stderr, "ERROR allocating reftable: %s\n",
>> +                        strerror(-ret));
>> +                res->check_errors++;
>> +                ret = rt_ofs;
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size);
>> +        if (ret < 0) {
>> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
>> +            goto fail;
>> +        }
>> +
>
>> +        on_disk_rb = g_malloc0(s->cluster_size);
> Do we want g_try_malloc0 here ?

There is no real reason not to, but there is also no real reason to. 
s->cluster_size should be reasonably small so this should never fail. On 
the other hand, if I'm respinning this anyway, I might as well.

>> +        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
>> +                    rb_start + i < *nb_clusters; i++)
>> +        {
>> +            on_disk_rb[i] = cpu_to_be16((*refcount_table)[rb_start + i]);
>> +        }
>> +
>> +        ret = bdrv_write(bs->file, rb_ofs / BDRV_SECTOR_SIZE,
>> +                         (void *)on_disk_rb, s->cluster_sectors);
>> +        g_free(on_disk_rb);
>> +        if (ret < 0) {
>> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
>> +            goto fail;
>> +        }
>> +
>> +        /* Go to the end of this refblock */
>> +        cluster = rb_start + s->cluster_size / sizeof(uint16_t) - 1;
>> +    }
>> +
>> +    if (rt_ofs < 0) {
>> +        int64_t post_rb_start = ROUND_UP(*nb_clusters,
>> +                                         s->cluster_size / sizeof(uint16_t));
>> +
>> +        /* Not pretty but simple */
>> +        if (first_free_cluster < post_rb_start) {
>> +            first_free_cluster = post_rb_start;
>> +        }
>> +        rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
>> +                                                          sizeof(uint64_t)),
>> +                                     refcount_table, nb_clusters,
>> +                                     &first_free_cluster);
>> +        if (rt_ofs < 0) {
>> +            fprintf(stderr, "ERROR allocating reftable: %s\n", strerror(-ret));
>> +            res->check_errors++;
>> +            ret = rt_ofs;
>> +            goto fail;
>> +        }
>> +
>> +        goto write_refblocks;
>> +    }
>> +
>> +    assert(reftable);
>> +
>> +    for (rb_index = 0; rb_index < reftable_size; rb_index++) {
>> +        cpu_to_be64s(&reftable[rb_index]);
>> +    }
>> +
>> +    ret = qcow2_pre_write_overlap_check(bs, 0, rt_ofs,
>> +                                        reftable_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    ret = bdrv_write(bs->file, rt_ofs / BDRV_SECTOR_SIZE, (void *)reftable,
>> +                     reftable_size * sizeof(uint64_t) / BDRV_SECTOR_SIZE);
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    /* Enter new reftable into the image header */
>> +    cpu_to_be64w((uint64_t *)&rt_offset_and_clusters[0], rt_ofs);
>> +    cpu_to_be32w((uint32_t *)&rt_offset_and_clusters[sizeof(uint64_t)],
>> +                 size_to_clusters(s, reftable_size * sizeof(uint64_t)));
> Manipulating an utility struct would problably some casts.

Hm... Well, I probably shouldn't use pre-existing bad code as an excuse. 
Okay, I'll use an anonymous struct.

>> +    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader,
>> +                                              refcount_table_offset),
>> +                           rt_offset_and_clusters,
>> +                           sizeof(rt_offset_and_clusters));
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR setting reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    for (rb_index = 0; rb_index < reftable_size; rb_index++) {
>> +        be64_to_cpus(&reftable[rb_index]);
>> +    }
>> +    s->refcount_table = reftable;
>> +    s->refcount_table_offset = rt_ofs;
>> +    s->refcount_table_size = reftable_size;
>> +
>> +    return 0;
>> +
>> +fail:
>> +    g_free(reftable);
>> +    return ret;
>> +}
>> +
>> +/*
>>    * Checks an image for refcount consistency.
>>    *
>>    * Returns 0 if no errors are found, the number of errors in case the image is
>> @@ -1612,6 +1873,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>                             BdrvCheckMode fix)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>> +    BdrvCheckResult pre_compare_res;
>>       int64_t size, highest_cluster, nb_clusters;
>>       uint16_t *refcount_table = NULL;
>>       bool rebuild = false;
>> @@ -1638,11 +1900,30 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>           goto fail;
>>       }
>>   
>> -    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
>> +    /* In case we don't need to rebuild the refcount structure (but want to fix
>> +     * something), this function is immediately called again, in which case the
>> +     * result should be ignored */
>> +    pre_compare_res = *res;
>> +    compare_refcounts(bs, res, 0, &rebuild, &highest_cluster, refcount_table,
>>                         nb_clusters);
>>   
>> -    if (rebuild) {
>> -        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
>> +    if (rebuild && (fix & BDRV_FIX_ERRORS)) {
>> +        fprintf(stderr, "Rebuilding refcount structure\n");
>> +        ret = rebuild_refcount_structure(bs, res, &refcount_table,
>> +                                         &nb_clusters);
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    } else if (fix) {
>> +        if (rebuild) {
>> +            fprintf(stderr, "ERROR need to rebuild refcount structures\n");
>> +        }
>> +
>> +        if (res->leaks || res->corruptions) {
>> +            *res = pre_compare_res;
>> +            compare_refcounts(bs, res, fix, &rebuild, &highest_cluster,
>> +                              refcount_table, nb_clusters);
>> +        }
>>       }
>>   
>>       /* check OFLAG_COPIED */
>> -- 
>> 2.1.0
>>
> I am nitpicking because I have trouble really understanding the code ;)

Nitpicking is far better than a Reviewed-by without much meaning behind 
it, so I'm glad about it. :-)

Max

next prev parent reply	other threads:[~2014-08-29 19:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-27 20:18 [Qemu-devel] [PATCH v4 00/10] qcow2: Fix image repairing Max Reitz
2014-08-27 20:18 ` [Qemu-devel] [PATCH v4 01/10] qcow2: Fix leaks in dirty images Max Reitz
2014-08-27 20:18 ` [Qemu-devel] [PATCH v4 02/10] qcow2: Split qcow2_check_refcounts() Max Reitz
2014-08-27 20:18 ` [Qemu-devel] [PATCH v4 03/10] qcow2: Pull check_refblocks() up Max Reitz
2014-08-27 20:18 ` [Qemu-devel] [PATCH v4 04/10] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
2014-08-27 20:18 ` [Qemu-devel] [PATCH v4 05/10] qcow2: Fix refcount blocks beyond image end Max Reitz
2014-08-29 19:50   ` Eric Blake
2014-08-29 19:56     ` Max Reitz
2014-08-27 20:18 ` [Qemu-devel] [PATCH v4 06/10] qcow2: Do not perform potentially damaging repairs Max Reitz
2014-08-27 20:18 ` [Qemu-devel] [PATCH v4 07/10] qcow2: Rebuild refcount structure during check Max Reitz
2014-08-28 16:08   ` Benoît Canet
2014-08-29 19:25     ` Max Reitz [this message]
2014-08-27 20:18 ` [Qemu-devel] [PATCH v4 08/10] qcow2: Clean up after refcount rebuild Max Reitz
2014-08-27 20:18 ` [Qemu-devel] [PATCH v4 09/10] iotests: Fix test outputs Max Reitz
2014-08-27 20:18 ` [Qemu-devel] [PATCH v4 10/10] iotests: Add test for potentially damaging repairs Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5400D39E.9060107@redhat.com \
    --to=mreitz@redhat.com \
    --cc=benoit.canet@nodalink.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.