qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Max Reitz <mreitz@redhat.com>
To: "Benoît Canet" <benoit.canet@irqsave.net>
Cc: Kevin Wolf <kwolf@redhat.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 4/8] qcow2: Do not perform potentially damaging repairs
Date: Fri, 15 Aug 2014 14:42:28 +0200	[thread overview]
Message-ID: <53EE0034.2050901@redhat.com> (raw)
In-Reply-To: <20140814123313.GI2009@irqsave.net>

On 14.08.2014 14:33, Benoît Canet wrote:
> The Wednesday 13 Aug 2014 à 23:01:46 (+0200), Max Reitz wrote :
>> If a referenced cluster has a refcount of 0, increasing its refcount may
>> result in clusters being allocated for the refcount structures. This may
>> overwrite the referenced cluster, therefore we cannot simply increase
>> the refcount then.
>>
>> In such cases, we can either try to replicate all the refcount
>> operations solely for the check operation, basing the allocations on the
>> in-memory refcount table; or we can simply rebuild the whole refcount
>> structure based on the in-memory refcount table. Since the latter will
>> be much easier, do that.
>>
>> To prepare for this, introduce a "rebuild" boolean which should be set
>> to true whenever a fix is rather dangerous or too complicated using the
>> current refcount structures. Another example for this is refcount blocks
>> being referenced more than once.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 191 +++++++------------------------------------------
>>   1 file changed, 27 insertions(+), 164 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index a1d93e5..6400840 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1381,127 +1381,12 @@ fail:
>>   }
>>   
>>   /*
>> - * Writes one sector of the refcount table to the disk
>> - */
>> -#define RT_ENTRIES_PER_SECTOR (512 / sizeof(uint64_t))
>> -static int write_reftable_entry(BlockDriverState *bs, int rt_index)
>> -{
>> -    BDRVQcowState *s = bs->opaque;
>> -    uint64_t buf[RT_ENTRIES_PER_SECTOR];
>> -    int rt_start_index;
>> -    int i, ret;
>> -
>> -    rt_start_index = rt_index & ~(RT_ENTRIES_PER_SECTOR - 1);
>> -    for (i = 0; i < RT_ENTRIES_PER_SECTOR; i++) {
>> -        buf[i] = cpu_to_be64(s->refcount_table[rt_start_index + i]);
>> -    }
>> -
>> -    ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_REFCOUNT_TABLE,
>> -            s->refcount_table_offset + rt_start_index * sizeof(uint64_t),
>> -            sizeof(buf));
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    BLKDBG_EVENT(bs->file, BLKDBG_REFTABLE_UPDATE);
>> -    ret = bdrv_pwrite_sync(bs->file, s->refcount_table_offset +
>> -            rt_start_index * sizeof(uint64_t), buf, sizeof(buf));
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    return 0;
>> -}
>> -
>> -/*
>> - * Allocates a new cluster for the given refcount block (represented by its
>> - * offset in the image file) and copies the current content there. This function
>> - * does _not_ decrement the reference count for the currently occupied cluster.
>> - *
>> - * This function prints an informative message to stderr on error (and returns
>> - * -errno); on success, the offset of the newly allocated cluster is returned.
>> - */
>> -static int64_t realloc_refcount_block(BlockDriverState *bs, int reftable_index,
>> -                                      uint64_t offset)
>> -{
>> -    BDRVQcowState *s = bs->opaque;
>> -    int64_t new_offset = 0;
>> -    void *refcount_block = NULL;
>> -    int ret;
>> -
>> -    /* allocate new refcount block */
>> -    new_offset = qcow2_alloc_clusters(bs, s->cluster_size);
>> -    if (new_offset < 0) {
>> -        fprintf(stderr, "Could not allocate new cluster: %s\n",
>> -                strerror(-new_offset));
>> -        ret = new_offset;
>> -        goto done;
>> -    }
>> -
>> -    /* fetch current refcount block content */
>> -    ret = qcow2_cache_get(bs, s->refcount_block_cache, offset, &refcount_block);
>> -    if (ret < 0) {
>> -        fprintf(stderr, "Could not fetch refcount block: %s\n", strerror(-ret));
>> -        goto fail_free_cluster;
>> -    }
>> -
>> -    /* new block has not yet been entered into refcount table, therefore it is
>> -     * no refcount block yet (regarding this check) */
>> -    ret = qcow2_pre_write_overlap_check(bs, 0, new_offset, s->cluster_size);
>> -    if (ret < 0) {
>> -        fprintf(stderr, "Could not write refcount block; metadata overlap "
>> -                "check failed: %s\n", strerror(-ret));
>> -        /* the image will be marked corrupt, so don't even attempt on freeing
>> -         * the cluster */
>> -        goto done;
>> -    }
>> -
>> -    /* write to new block */
>> -    ret = bdrv_write(bs->file, new_offset / BDRV_SECTOR_SIZE, refcount_block,
>> -            s->cluster_sectors);
>> -    if (ret < 0) {
>> -        fprintf(stderr, "Could not write refcount block: %s\n", strerror(-ret));
>> -        goto fail_free_cluster;
>> -    }
>> -
>> -    /* update refcount table */
>> -    assert(!offset_into_cluster(s, new_offset));
>> -    s->refcount_table[reftable_index] = new_offset;
>> -    ret = write_reftable_entry(bs, reftable_index);
>> -    if (ret < 0) {
>> -        fprintf(stderr, "Could not update refcount table: %s\n",
>> -                strerror(-ret));
>> -        goto fail_free_cluster;
>> -    }
>> -
>> -    goto done;
>> -
>> -fail_free_cluster:
>> -    qcow2_free_clusters(bs, new_offset, s->cluster_size, QCOW2_DISCARD_OTHER);
>> -
>> -done:
>> -    if (refcount_block) {
>> -        /* This should never fail, as it would only do so if the given refcount
>> -         * block cannot be found in the cache. As this is impossible as long as
>> -         * there are no bugs, assert the success. */
>> -        int tmp = qcow2_cache_put(bs, s->refcount_block_cache, &refcount_block);
>> -        assert(tmp == 0);
>> -    }
>> -
>> -    if (ret < 0) {
>> -        return ret;
>> -    }
>> -
>> -    return new_offset;
>> -}
>> -
>> -/*
>>    * Checks consistency of refblocks and accounts for each refblock in
>>    * *refcount_table.
>>    */
>>   static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>> -                           BdrvCheckMode fix, uint16_t **refcount_table,
>> -                           int64_t *nb_clusters)
>> +                           BdrvCheckMode fix, bool *rebuild,
>> +                           uint16_t **refcount_table, int64_t *nb_clusters)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>>       int64_t i, size;
>> @@ -1517,6 +1402,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>>               fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
>>                       "cluster aligned; refcount table entry corrupted\n", i);
>>               res->corruptions++;
>> +            *rebuild = true;
>>               continue;
>>           }
>>   
>> @@ -1571,47 +1457,12 @@ resize_fail:
>>           if (offset != 0) {
>>               inc_refcounts(bs, res, *refcount_table, *nb_clusters,
>>                             offset, s->cluster_size);
>> -            if ((*refcount_table)[cluster] != 1) {
>> -                fprintf(stderr, "%s refcount block %" PRId64
>> -                        " refcount=%d\n",
>> -                        fix & BDRV_FIX_ERRORS ? "Repairing" :
>> -                                                "ERROR",
>> -                        i, (*refcount_table)[cluster]);
>> -
>> -                if (fix & BDRV_FIX_ERRORS) {
>> -                    int64_t new_offset;
>> -
>> -                    new_offset = realloc_refcount_block(bs, i, offset);
>> -                    if (new_offset < 0) {
>> -                        res->corruptions++;
>> -                        continue;
>> -                    }
>> -
>> -                    /* update refcounts */
>> -                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
>> -                        /* increase refcount_table size if necessary */
>> -                        int old_nb_clusters = *nb_clusters;
>> -                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
>> -
>> -                        *refcount_table = g_try_realloc(*refcount_table,
>> -                                *nb_clusters * sizeof(uint16_t));
>> -                        if (!*refcount_table) {
>> -                            res->check_errors++;
>> -                            return -ENOMEM;
>> -                        }
>> -
>> -                        memset(&(*refcount_table)[old_nb_clusters], 0,
>> -                               (*nb_clusters - old_nb_clusters) *
>> -                               sizeof(uint16_t));
>> -                    }
>> -                    (*refcount_table)[cluster]--;
>> -                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
>> -                                  new_offset, s->cluster_size);
>>   
>> -                    res->corruptions_fixed++;
>> -                } else {
>> -                    res->corruptions++;
>> -                }
>> +            if ((*refcount_table)[cluster] != 1) {
>> +                fprintf(stderr, "ERROR refcount block %" PRId64
>> +                        " refcount=%d\n", i, (*refcount_table)[cluster]);
>> +                res->corruptions++;
>> +                *rebuild = true;
>>               }
>>           }
>>       }
>> @@ -1623,8 +1474,8 @@ resize_fail:
>>    * Calculates an in-memory refcount table.
>>    */
>>   static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>> -                               BdrvCheckMode fix, uint16_t **refcount_table,
>> -                               int64_t *nb_clusters)
>> +                               BdrvCheckMode fix, bool *rebuild,
>> +                               uint16_t **refcount_table, int64_t *nb_clusters)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>>       QCowSnapshot *sn;
>> @@ -1667,7 +1518,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>                     s->refcount_table_offset,
>>                     s->refcount_table_size * sizeof(uint64_t));
>>   
>> -    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
>> +    return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_clusters);
>>   }
>>   
>>   /*
>> @@ -1675,7 +1526,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>    * refcount as reported by the refcount structures on-disk.
>>    */
>>   static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>> -                              BdrvCheckMode fix, int64_t *highest_cluster,
>> +                              BdrvCheckMode fix, bool *rebuild,
>> +                              int64_t *highest_cluster,
>>                                 uint16_t *refcount_table, int64_t nb_clusters)
>>   {
>>       BDRVQcowState *s = bs->opaque;
>> @@ -1702,10 +1554,15 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>               int *num_fixed = NULL;
>>               if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
>>                   num_fixed = &res->leaks_fixed;
>> -            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
>> +            } else if (refcount1 > 0 && refcount1 < refcount2 &&
>> +                       (fix & BDRV_FIX_ERRORS)) {
>>                   num_fixed = &res->corruptions_fixed;
>>               }
>>   
>> +            if (refcount1 == 0) {
>> +                *rebuild = true;
>> +            }
>> +
>>               fprintf(stderr, "%s cluster %" PRId64 " refcount=%d reference=%d\n",
>>                       num_fixed != NULL     ? "Repairing" :
>>                       refcount1 < refcount2 ? "ERROR" :
>> @@ -1744,6 +1601,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>       BDRVQcowState *s = bs->opaque;
>>       int64_t size, highest_cluster, nb_clusters;
>>       uint16_t *refcount_table = NULL;
>> +    bool rebuild = false;
>>       int ret;
>>   
>>       size = bdrv_getlength(bs->file);
>> @@ -1761,14 +1619,19 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>       res->bfi.total_clusters =
>>           size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
>>   
>> -    ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
>> +    ret = calculate_refcounts(bs, res, fix, &rebuild, &refcount_table,
>> +                              &nb_clusters);
>>       if (ret < 0) {
>>           goto fail;
>>       }
>>   
>> -    compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
>> +    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
>>                         nb_clusters);
>>   
>> +    if (rebuild) {
>> +        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
>> +    }
> Aren't you afraid that introducing rebuild in this way could
> break git bisect ?
>
> Why not introducing the new rebuild code in the previous patch and
> use it in this one ?

Because it's a static function and an unused static function really 
breaks git bisect.

The worst this patch does is that it prevents qemu from repairing files 
in a way which could actually break the file. It does not fix the repair 
operation (that's the next patch), but it at least prevents it, which is 
in my opinion not really breaking things (because things are already 
broken).

If one runs some tests against some expected output (like the 
qemu-iotests do), bisect is broken anyway because the output from after 
this series differs from before (that's why patch 7 exists).

The best I could probably do is to pull patch 5 before this one (like 
you suggest) while creating the rebuild function as a non-static one in 
that patch and making it static here so that gcc does not complain.

Max

> Best regards
>
> Benoît
>
>> +
>>       /* check OFLAG_COPIED */
>>       ret = check_oflag_copied(bs, res, fix);
>>       if (ret < 0) {
>> -- 
>> 2.0.3
>>
>>

  reply	other threads:[~2014-08-15 12:42 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-08-13 21:01 [Qemu-devel] [PATCH 0/8] qcow2: Fix image repairing Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 1/8] qcow2: Factor out refcount accounting for check Max Reitz
2014-08-14 11:56   ` Benoît Canet
2014-08-13 21:01 ` [Qemu-devel] [PATCH 2/8] qcow2: Factor out refcount comparison " Max Reitz
2014-08-14 12:02   ` Benoît Canet
2014-08-15 12:31     ` Max Reitz
2014-08-15 13:47       ` Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 3/8] qcow2: Fix refcount blocks beyond image end Max Reitz
2014-08-14 12:11   ` Benoît Canet
2014-08-15 12:36     ` Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 4/8] qcow2: Do not perform potentially damaging repairs Max Reitz
2014-08-14 12:33   ` Benoît Canet
2014-08-15 12:42     ` Max Reitz [this message]
2014-08-13 21:01 ` [Qemu-devel] [PATCH 5/8] qcow2: Rebuild refcount structure during check Max Reitz
2014-08-14 12:58   ` Benoît Canet
2014-08-15 12:49     ` Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 6/8] qcow2: Clean up after refcount rebuild Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 7/8] iotests: Fix test outputs Max Reitz
2014-08-13 21:01 ` [Qemu-devel] [PATCH 8/8] iotests: Add test for potentially damaging repairs Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=53EE0034.2050901@redhat.com \
    --to=mreitz@redhat.com \
    --cc=benoit.canet@irqsave.net \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).