All of lore.kernel.org
 help / color / mirror / Atom feed
From: Eric Blake <eblake@redhat.com>
To: Max Reitz <mreitz@redhat.com>, qemu-devel@nongnu.org
Cc: "Kevin Wolf" <kwolf@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Benoît Canet" <benoit.canet@nodalink.com>
Subject: Re: [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check
Date: Wed, 08 Oct 2014 17:09:08 -0600	[thread overview]
Message-ID: <5435C414.5030203@redhat.com> (raw)
In-Reply-To: <1409348463-16627-9-git-send-email-mreitz@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 11963 bytes --]

On 08/29/2014 03:41 PM, Max Reitz wrote:
> The previous commit introduced the "rebuild" variable to qcow2's
> implementation of the image consistency check. Now make use of this by
> adding a function which creates a completely new refcount structure
> based solely on the in-memory information gathered before.
> 
> The old refcount structure will be leaked, however.

Might be worth mentioning in the commit message that a later commit will
deal with the leak.

> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> ---
>  block/qcow2-refcount.c | 286 ++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 283 insertions(+), 3 deletions(-)
> 
> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
> index 6300cec..318c152 100644
> --- a/block/qcow2-refcount.c
> +++ b/block/qcow2-refcount.c
> @@ -1603,6 +1603,266 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>  }
>  
>  /*
> + * Allocates a cluster using an in-memory refcount table (IMRT) in contrast to
> + * the on-disk refcount structures.
> + *
> + * *first_free_cluster does not necessarily point to the first free cluster, but
> + * may point to one cluster as close as possible before it. The offset returned
> + * will never be before that cluster.

Took me a couple reads of the comment and code to understand that.  If
I'm correct, this alternative wording may be better:

On input, *first_free_cluster tells where to start looking, and need not
actually be a free cluster; the returned offset will not be before that
cluster.  On output, *first_free_cluster points to the actual first free
cluster found.

Or, depending on the semantics you intended [1]:

On input, *first_free_cluster tells where to start looking, and need not
actually be a free cluster; the returned offset will not be before that
cluster.  On output, *first_free_cluster points to the first gap found,
even if that gap was too small to be used as the returned offset.

> + *
> + * Note that *first_free_cluster is a cluster index whereas the return value is
> + * an offset.
> + */
> +static int64_t alloc_clusters_imrt(BlockDriverState *bs,
> +                                   int cluster_count,
> +                                   uint16_t **refcount_table,
> +                                   int64_t *nb_clusters,
> +                                   int64_t *first_free_cluster)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int64_t cluster = *first_free_cluster, i;
> +    bool first_gap = true;
> +    int contiguous_free_clusters;
> +
> +    /* Starting at *first_free_cluster, find a range of at least cluster_count
> +     * continuously free clusters */
> +    for (contiguous_free_clusters = 0;
> +         cluster < *nb_clusters && contiguous_free_clusters < cluster_count;
> +         cluster++)
> +    {
> +        if (!(*refcount_table)[cluster]) {
> +            contiguous_free_clusters++;
> +            if (first_gap) {
> +                /* If this is the first free cluster found, update
> +                 * *first_free_cluster accordingly */
> +                *first_free_cluster = cluster;
> +                first_gap = false;
> +            }
> +        } else if (contiguous_free_clusters) {
> +            contiguous_free_clusters = 0;
> +        }

[1] Should you be resetting first_gap in the 'else'?  If you don't, then
*first_free_cluster is NOT the start of the cluster just allocated, but
the first free cluster encountered on the way to the eventual
allocation.  I guess it depends on how the callers are using the
information; since the function is static, I guess I'll find out later
in my review.

> +    }
> +
> +    /* If contiguous_free_clusters is greater than zero, it contains the number
> +     * of continuously free clusters until the current cluster; the first free
> +     * cluster in the current "gap" is therefore
> +     * cluster - contiguous_free_clusters */
> +
> +    /* If no such range could be found, grow the in-memory refcount table
> +     * accordingly to append free clusters at the end of the image */
> +    if (contiguous_free_clusters < cluster_count) {
> +        int64_t old_nb_clusters = *nb_clusters;
> +
> +        /* There already is a gap of contiguous_free_clusters; we need

s/gap/tail/, since we are at the end of the table?

> +         * cluster_count clusters; therefore, we have to allocate
> +         * cluster_count - contiguous_free_clusters new clusters at the end of
> +         * the image (which is the current value of cluster; note that cluster
> +         * may exceed old_nb_clusters if *first_free_cluster pointed beyond the
> +         * image end) */
> +        *nb_clusters = cluster + cluster_count - contiguous_free_clusters;
> +        *refcount_table = g_try_realloc(*refcount_table,
> +                                        *nb_clusters * sizeof(uint16_t));
> +        if (!*refcount_table) {
> +            return -ENOMEM;
> +        }
> +
> +        memset(*refcount_table + old_nb_clusters, 0,
> +               (*nb_clusters - old_nb_clusters) * sizeof(uint16_t));

Is this calculation unnecessarily hard-coded to refcount_order==4?

> +    }
> +
> +    /* Go back to the first free cluster */
> +    cluster -= contiguous_free_clusters;
> +    for (i = 0; i < cluster_count; i++) {
> +        (*refcount_table)[cluster + i] = 1;
> +    }
> +
> +    return cluster << s->cluster_bits;
> +}
> +
> +/*
> + * Creates a new refcount structure based solely on the in-memory information
> + * given through *refcount_table. All necessary allocations will be reflected
> + * in that array.
> + *
> + * On success, the old refcount structure is leaked (it will be covered by the
> + * new refcount structure).
> + */
> +static int rebuild_refcount_structure(BlockDriverState *bs,
> +                                      BdrvCheckResult *res,
> +                                      uint16_t **refcount_table,
> +                                      int64_t *nb_clusters)
> +{
> +    BDRVQcowState *s = bs->opaque;
> +    int64_t first_free_cluster = 0, rt_ofs = -1, cluster = 0;
> +    int64_t rb_ofs, rb_start, rb_index;
> +    uint32_t reftable_size = 0;
> +    uint64_t *reftable = NULL;
> +    uint16_t *on_disk_rb;
> +    int i, ret = 0;

ret is 0...

> +    struct {
> +        uint64_t rt_offset;
> +        uint32_t rt_clusters;
> +    } QEMU_PACKED rt_offset_and_clusters;
> +
> +    qcow2_cache_empty(bs, s->refcount_block_cache);
> +
> +write_refblocks:
> +    for (; cluster < *nb_clusters; cluster++) {
> +        if (!(*refcount_table)[cluster]) {
> +            continue;
> +        }
> +
> +        rb_index = cluster >> s->refcount_block_bits;
> +        rb_start = rb_index << s->refcount_block_bits;
> +
> +        /* Don't allocate a cluster in a refblock already written to disk */
> +        if (first_free_cluster < rb_start) {
> +            first_free_cluster = rb_start;
> +        }
> +        rb_ofs = alloc_clusters_imrt(bs, 1, refcount_table, nb_clusters,
> +                                     &first_free_cluster);

[1] looking back at my earlier question, you are starting each iteration
no earlier than the current rb_start.  But if you end up jumping back to
write_refblocks, are you guaranteed that rb_start is safely far enough
into the file, even if first_free_cluster is pointing to a gap that was
too small for an allocation?

> +        if (rb_ofs < 0) {
> +            fprintf(stderr, "ERROR allocating refblock: %s\n", strerror(-ret));

...but if we hit this error on the first time through the for loop,
strerror(0) is NOT what you meant to print.  Did you mean
strerror(-rb_ofs) here?

> +            res->check_errors++;
> +            ret = rb_ofs;

Narrowing from int64_t to int; but I guess we know that if rb_ofs < 0,
it is only -1, and not something weird like -0x100000000.  Is the goal
that ret is -1/0, or are you trying to encode negative errno values in
the return?

> +            goto fail;
> +        }
> +
> +        if (reftable_size <= rb_index) {
> +            uint32_t old_rt_size = reftable_size;
> +            reftable_size = ROUND_UP((rb_index + 1) * sizeof(uint64_t),
> +                                     s->cluster_size) / sizeof(uint64_t);
> +            reftable = g_try_realloc(reftable,
> +                                     reftable_size * sizeof(uint64_t));
> +            if (!reftable) {
> +                res->check_errors++;
> +                ret = -ENOMEM;
> +                goto fail;
> +            }
> +
> +            memset(reftable + old_rt_size, 0,
> +                   (reftable_size - old_rt_size) * sizeof(uint64_t));
> +
> +            /* The offset we have for the reftable is now no longer valid;
> +             * this will leak that range, but we can easily fix that by running
> +             * a leak-fixing check after this rebuild operation */
> +            rt_ofs = -1;
> +        }
> +        reftable[rb_index] = rb_ofs;
> +
> +        /* If this is apparently the last refblock (for now), try to squeeze the
> +         * reftable in */
> +        if (rb_index == (*nb_clusters - 1) >> s->refcount_block_bits &&
> +            rt_ofs < 0)
> +        {
> +            rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
> +                                                              sizeof(uint64_t)),
> +                                         refcount_table, nb_clusters,
> +                                         &first_free_cluster);
> +            if (rt_ofs < 0) {
> +                fprintf(stderr, "ERROR allocating reftable: %s\n",
> +                        strerror(-ret));

Again, -ret looks wrong here.

> +                res->check_errors++;
> +                ret = rt_ofs;
> +                goto fail;
> +            }
> +        }
> +
> +        ret = qcow2_pre_write_overlap_check(bs, 0, rb_ofs, s->cluster_size);
> +        if (ret < 0) {
> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
> +            goto fail;
> +        }
> +
> +        on_disk_rb = g_malloc0(s->cluster_size);

Why g_try_malloc earlier, but abort()ing g_malloc0 here?

> +        for (i = 0; i < s->cluster_size / sizeof(uint16_t) &&
> +                    rb_start + i < *nb_clusters; i++)
> +        {
> +            on_disk_rb[i] = cpu_to_be16((*refcount_table)[rb_start + i]);
> +        }
> +
> +        ret = bdrv_write(bs->file, rb_ofs / BDRV_SECTOR_SIZE,
> +                         (void *)on_disk_rb, s->cluster_sectors);
> +        g_free(on_disk_rb);
> +        if (ret < 0) {
> +            fprintf(stderr, "ERROR writing refblock: %s\n", strerror(-ret));
> +            goto fail;
> +        }
> +
> +        /* Go to the end of this refblock */
> +        cluster = rb_start + s->cluster_size / sizeof(uint16_t) - 1;
> +    }
> +
> +    if (rt_ofs < 0) {
> +        int64_t post_rb_start = ROUND_UP(*nb_clusters,
> +                                         s->cluster_size / sizeof(uint16_t));
> +
> +        /* Not pretty but simple */
> +        if (first_free_cluster < post_rb_start) {
> +            first_free_cluster = post_rb_start;
> +        }
> +        rt_ofs = alloc_clusters_imrt(bs, size_to_clusters(s, reftable_size *
> +                                                          sizeof(uint64_t)),
> +                                     refcount_table, nb_clusters,
> +                                     &first_free_cluster);
> +        if (rt_ofs < 0) {
> +            fprintf(stderr, "ERROR allocating reftable: %s\n", strerror(-ret));

Another wrong -ret?

-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 539 bytes --]

  reply	other threads:[~2014-10-09  3:28 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-29 21:40 [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 01/11] qcow2: Calculate refcount block entry count Max Reitz
2014-08-29 23:03   ` Eric Blake
2014-09-02 18:56     ` Max Reitz
2014-10-10 12:29   ` Benoît Canet
2014-10-11 10:27     ` Max Reitz
2014-10-20 14:25   ` Kevin Wolf
2014-10-20 14:39     ` Max Reitz
2014-10-20 14:48       ` Kevin Wolf
2014-10-21 16:26         ` Max Reitz
2014-10-22  8:27           ` Kevin Wolf
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 02/11] qcow2: Fix leaks in dirty images Max Reitz
2014-10-20 14:28   ` Kevin Wolf
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 03/11] qcow2: Split qcow2_check_refcounts() Max Reitz
2014-10-20 14:45   ` Kevin Wolf
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 04/11] qcow2: Pull check_refblocks() up Max Reitz
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 05/11] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 06/11] qcow2: Fix refcount blocks beyond image end Max Reitz
2014-08-29 21:40 ` [Qemu-devel] [PATCH v5 07/11] qcow2: Do not perform potentially damaging repairs Max Reitz
2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 08/11] qcow2: Rebuild refcount structure during check Max Reitz
2014-10-08 23:09   ` Eric Blake [this message]
2014-10-11 10:17     ` Max Reitz
2014-10-16 15:27     ` Max Reitz
2014-10-16 15:33       ` Eric Blake
2014-10-10 12:44   ` Benoît Canet
2014-10-11 10:27     ` Max Reitz
2014-10-11 18:56   ` Benoît Canet
2014-10-12  7:32     ` Max Reitz
2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 09/11] qcow2: Clean up after refcount rebuild Max Reitz
2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 10/11] iotests: Fix test outputs Max Reitz
2014-09-02 19:49   ` Eric Blake
2014-08-29 21:41 ` [Qemu-devel] [PATCH v5 11/11] iotests: Add test for potentially damaging repairs Max Reitz
2014-09-02 19:52   ` Eric Blake
2014-10-08 19:25 ` [Qemu-devel] [PATCH v5 00/11] qcow2: Fix image repairing Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5435C414.5030203@redhat.com \
    --to=eblake@redhat.com \
    --cc=benoit.canet@nodalink.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.