All of lore.kernel.org
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: "Benoît Canet" <benoit.canet@irqsave.net>
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 5/8] qcow2: Rebuild refcount structure during check
Date: Fri, 15 Aug 2014 14:49:07 +0200	[thread overview]
Message-ID: <53EE01C3.8090501@redhat.com> (raw)
In-Reply-To: <20140814125858.GJ2009@irqsave.net>

On 14.08.2014 14:58, Benoît Canet wrote:
> The Wednesday 13 Aug 2014 à 23:01:47 (+0200), Max Reitz wrote :
>> The previous commit introduced the "rebuild" variable to qcow2's
>> implementation of the image consistency check. Now make use of this by
>> adding a function which creates a completely new refcount structure
>> based solely on the in-memory information gathered before.
>>
>> The old refcount structure will be leaked, however.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 262 ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   1 file changed, 259 insertions(+), 3 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index 6400840..e3ca03a 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1590,6 +1590,242 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>   }
>>   
>>   /*
>> + * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to
>> + * the on-disk refcount structures.
>> + *
>> + * *first_free_cluster does not necessarily point to the first free cluster, but
>> + * may point to one cluster as close as possible before it. The offset returned
>> + * will never be before that cluster.
>> + */
>> +static int64_t alloc_clusters_imrt(BlockDriverState *bs,
>> +                                   int cluster_count,
>> +                                   uint16_t **refcount_table,
>> +                                   int64_t *nb_clusters,
> I don't understand this parameters name.
> nb_ and the plural seems to imply that it's a cluster count.
> While the int64_t imply it's large.

It certainly is. It's the total cluster count of the file (which is what 
"nb_clusters" is always used for in the check code).

>> +                                   int64_t *first_free_cluster)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t cluster = *first_free_cluster, i;
> Here cluster is a cluster offset.

A cluster index.


>> +    bool first_gap = true;
>> +    int contiguous_clusters;
>> +
>> +    for (contiguous_clusters = 0;
>> +         cluster < *nb_clusters && contiguous_clusters < cluster_count;
> And here we compare a cluster offset (cluster) with something named like a
> cluster count (nb_clusters).

Because cluster is an index.

>> +         cluster++)
>> +    {
>> +        if (!(*refcount_table)[cluster]) {
>> +            contiguous_clusters++;
>> +            if (first_gap) {
>> +                *first_free_cluster = cluster;
>> +                first_gap = false;
>> +            }
>> +        } else if (contiguous_clusters) {
>> +            contiguous_clusters = 0;
>> +        }
>> +    }
>> +
>> +    if (contiguous_clusters < cluster_count) {
>> +        int64_t old_nb_clusters = *nb_clusters;
>> +
>> +        *nb_clusters = cluster + cluster_count - contiguous_clusters;
> Here we compute a cluster offset (nb_cluster) by adding and removing
> clusters counts (cluster_count, continuous_cluster) to a cluster offset (cluster)
> while (nb_clusters) is named like a cluster count.

*nb_clusters is a cluster count. cluster is a cluster index.

>
>> +        *refcount_table = g_try_realloc(*refcount_table,
>> +                                        *nb_clusters * sizeof(uint16_t));
> This confuse me further.

*nb_clusters is the number of clusters in the file. The refcount table 
has one uint16_t entry per cluster.


I'll state more clearly in the comment above this function that 
"first_free_cluster" is a cluster index albeit the function returns an 
offset. Without further protest, I won't change this difference, though; 
it's much more convenient to work with first_free_cluster being an index 
and the returned value being an offset than both having the same unit.

Max

>> +        if (!*refcount_table) {
>> +            return -ENOMEM;
>> +        }
>> +
>> +        memset(*refcount_table + old_nb_clusters, 0,
>> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));
>> +    }
>> +
>> +    cluster -= contiguous_clusters;
>> +    for (i = 0; i < cluster_count; i++) {
>> +        (*refcount_table)[cluster + i] = 1;
>> +    }
>> +
>> +    return cluster << s->cluster_bits;
>> +}
>> +
>> +/*
>> + * Creates a new refcount structure based solely on the in-memory information
>> + * given through *refcount_table. All necessary allocations will be reflected
>> + * in that array.
>> + *
>> + * On success, the old refcount structure is leaked (it will be covered by the
>> + * new refcount structure).
>> + */
>> +static int rebuild_refcount_structure(BlockDriverState *bs,
>> +                                      BdrvCheckResult *res,
>> +                                      uint16_t **refcount_table,
>> +                                      int64_t *nb_clusters)
>> +{
>> +    BDRVQcowState *s = bs->opaque;
>> +    int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
>> +    int64_t rb_ofs, rb_start, rb_index;
>> +    uint32_t reftable_size = 0;
>> +    uint64_t *reftable = NULL;
>> +    uint16_t *on_disk_rb;
>> +    uint8_t rt_offset_and_clusters[sizeof(uint64_t) + sizeof(uint32_t)];
>> +    int i, ret = 0;
>> +
>> +    qcow2_cache_empty(bs, s->refcount_block_cache);
>> +
>> +write_refblocks:
>> +    for (; cluster < *nb_clusters; cluster++) {
>> +        if (!(*refcount_table)[cluster]) {
>> +            continue;
>> +        }
>> +
>> +        rb_index = cluster >> (s->cluster_bits - 1);
>> +        rb_start = rb_index << (s->cluster_bits - 1);
>> +
>> +        /* Don't allocate a cluster in a refblock already written to disk */
>> +        if (first_free_cluster < rb_start) {
>> +            first_free_cluster = rb_start;
>> +        }
>> +        rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters,
>> +                                     &first_free_cluster);
>> +        if (rb_ofs < 0) {
>> +            fprintf(stderr, "ERROR allocating refblock: %s\n", strerror(-ret));
>> +            res->check_errors++;
>> +            ret = rb_ofs;
>> +            goto fail;
>> +        }
>> +
>> +        if (reftable_size <= rb_index) {
>> +            uint32_t old_rt_size = reftable_size;
>> +            reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t),
>> +                                     s->cluster_size) / sizeof(uint64_t);
>> +            reftable = g_try_realloc(reftable,
>> +                                     reftable_size * sizeof(uint64_t));
>> +            if (!reftable) {
>> +                res->check_errors++;
>> +                ret = -ENOMEM;
>> +                goto fail;
>> +            }
>> +
>> +            memset(reftable + old_rt_size, 0,
>> +                   (reftable_size - old_rt_size) * sizeof(uint64_t));
>> +
>> +            /* The offset we have for the reftable is now no longer valid;
>> +             * this will leak that range, but we can easily fix that by running
>> +             * a leak-fixing check after this rebuild operation */
>> +            rt_ofs = -1;
>> +        }
>> +        reftable[rb_index] = rb_ofs;
>> +
>> +        /* If this is apparently the last refblock (for now), try to squeeze the
>> +         * reftable in */
>> +        if (rb_index == (*nb_clusters - 1) >> (s->cluster_bits - 1) &&
>> +            rt_ofs < 0)
>> +        {
>> +            rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
>> +                                                              sizeof(uint64_t)),
>> +                                         refcount_table, nb_clusters,
>> +                                         &first_free_cluster);
>> +            if (rt_ofs < 0) {
>> +                fprintf(stderr, "ERROR allocating reftable: %s\n",
>> +                        strerror(-ret));
>> +                res->check_errors++;
>> +                ret = rt_ofs;
>> +                goto fail;
>> +            }
>> +        }
>> +
>> +        ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size);
>> +        if (ret < 0) {
>> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
>> +            goto fail;
>> +        }
>> +
>> +        on_disk_rb = g_malloc0(s->cluster_size);
>> +        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
>> +                    rb_start + i < *nb_clusters; i++)
>> +        {
>> +            on_disk_rb[i] = cpu_to_be16((*refcount_table)[rb_start + i]);
>> +        }
>> +
>> +        ret = bdrv_write(bs->file, rb_ofs / BDRV_SECTOR_SIZE,
>> +                         (void *)on_disk_rb, s->cluster_sectors);
>> +        g_free(on_disk_rb);
>> +        if (ret < 0) {
>> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
>> +            goto fail;
>> +        }
>> +
>> +        /* Go to the end of this refblock */
>> +        cluster = rb_start + s->cluster_size / sizeof(uint16_t) - 1;
>> +    }
>> +
>> +    if (rt_ofs < 0) {
>> +        int64_t post_rb_start = ROUND_UP(*nb_clusters,
>> +                                         s->cluster_size / sizeof(uint16_t));
>> +
>> +        /* Not pretty but simple */
>> +        if (first_free_cluster < post_rb_start) {
>> +            first_free_cluster = post_rb_start;
>> +        }
>> +        rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
>> +                                                          sizeof(uint64_t)),
>> +                                     refcount_table, nb_clusters,
>> +                                     &first_free_cluster);
>> +        if (rt_ofs < 0) {
>> +            fprintf(stderr, "ERROR allocating reftable: %s\n", strerror(-ret));
>> +            res->check_errors++;
>> +            ret = rt_ofs;
>> +            goto fail;
>> +        }
>> +
>> +        goto write_refblocks;
>> +    }
>> +
>> +    assert(reftable);
>> +
>> +    for (rb_index = 0; rb_index < reftable_size; rb_index++) {
>> +        cpu_to_be64s(&reftable[rb_index]);
>> +    }
>> +
>> +    ret = qcow2_pre_write_overlap_check(bs, 0, rt_ofs,
>> +                                        reftable_size * sizeof(uint64_t));
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    ret = bdrv_write(bs->file, rt_ofs / BDRV_SECTOR_SIZE, (void *)reftable,
>> +                     reftable_size * sizeof(uint64_t) / BDRV_SECTOR_SIZE);
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR writing reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    /* Enter new reftable into the image header */
>> +    cpu_to_be64w((uint64_t *)&rt_offset_and_clusters[0], rt_ofs);
>> +    cpu_to_be32w((uint32_t *)&rt_offset_and_clusters[sizeof(uint64_t)],
>> +                 size_to_clusters(s, reftable_size * sizeof(uint64_t)));
>> +    ret = bdrv_pwrite_sync(bs->file, offsetof(QCowHeader,
>> +                                              refcount_table_offset),
>> +                           rt_offset_and_clusters,
>> +                           sizeof(rt_offset_and_clusters));
>> +    if (ret < 0) {
>> +        fprintf(stderr, "ERROR setting reftable: %s\n", strerror(-ret));
>> +        goto fail;
>> +    }
>> +
>> +    for (rb_index = 0; rb_index < reftable_size; rb_index++) {
>> +        be64_to_cpus(&reftable[rb_index]);
>> +    }
>> +    s->refcount_table = reftable;
>> +    s->refcount_table_offset = rt_ofs;
>> +    s->refcount_table_size = reftable_size;
>> +
>> +    return 0;
>> +
>> +fail:
>> +    g_free(reftable);
>> +    return ret;
>> +}
>> +
>> +/*
>>    * Checks an image for refcount consistency.
>>    *
>>    * Returns 0 if no errors are found, the number of errors in case the image is
>> @@ -1599,6 +1835,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>                             BdrvCheckMode fix)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>> +    BdrvCheckResult pre_compare_res;
>>       int64_t size, highest_cluster, nb_clusters;
>>       uint16_t *refcount_table = NULL;
>>       bool rebuild = false;
>> @@ -1625,11 +1862,30 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>           goto fail;
>>       }
>>   
>> -    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
>> +    /* In case we don't need to rebuild the refcount structure (but want to fix
>> +     * something), this function is immediately called again, in which case the
>> +     * result should be ignored */
>> +    pre_compare_res = *res;
>> +    compare_refcounts(bs, res, 0, &rebuild, &highest_cluster, refcount_table,
>>                         nb_clusters);
>>   
>> -    if (rebuild) {
>> -        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
>> +    if (rebuild && (fix & BDRV_FIX_ERRORS)) {
>> +        fprintf(stderr, "Rebuilding refcount structure\n");
>> +        ret = rebuild_refcount_structure(bs, res, &refcount_table,
>> +                                         &nb_clusters);
>> +        if (ret < 0) {
>> +            goto fail;
>> +        }
>> +    } else if (fix) {
>> +        if (rebuild) {
>> +            fprintf(stderr, "ERROR need to rebuild refcount structures\n");
>> +        }
>> +
>> +        if (res->leaks || res->corruptions) {
>> +            *res = pre_compare_res;
>> +            compare_refcounts(bs, res, fix, &rebuild, &highest_cluster,
>> +                              refcount_table, nb_clusters);
>> +        }
>>       }
>>   
>>       /* check OFLAG_COPIED */
>> -- 
>> 2.0.3
>>
>>

  reply	other threads:[~2014-08-15 12:49 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-13 21:01 [Qemu-devel] [PATCH 0/8] qcow2: Fix image repairing Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 1/8] qcow2: Factor out refcount accounting for check Max Reitz
2014-08-14 11:56   ` Benoît Canet
2014-08-13 21:01 ` [Qemu-devel] [PATCH 2/8] qcow2: Factor out refcount comparison " Max Reitz
2014-08-14 12:02   ` Benoît Canet
2014-08-15 12:31     ` Max Reitz
2014-08-15 13:47       ` Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 3/8] qcow2: Fix refcount blocks beyond image end Max Reitz
2014-08-14 12:11   ` Benoît Canet
2014-08-15 12:36     ` Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 4/8] qcow2: Do not perform potentially damaging repairs Max Reitz
2014-08-14 12:33   ` Benoît Canet
2014-08-15 12:42     ` Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 5/8] qcow2: Rebuild refcount structure during check Max Reitz
2014-08-14 12:58   ` Benoît Canet
2014-08-15 12:49     ` Max Reitz [this message]
2014-08-13 21:01 ` [Qemu-devel] [PATCH 6/8] qcow2: Clean up after refcount rebuild Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 7/8] iotests: Fix test outputs Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 8/8] iotests: Add test for potentially damaging repairs Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53EE01C3.8090501@redhat.com \
    --to=mreitz@redhat.com \
    --cc=benoit.canet@irqsave.net \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.