Re: [Qemu-devel] [PATCH v2 4/5] qcow2: More complete consistency check

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 4/5] qcow2: More complete consistency check
Date: Thu, 29 Aug 2013 14:09:43 +0200	[thread overview]
Message-ID: <521F3A07.2000002@redhat.com> (raw)
In-Reply-To: <20130829113654.GG2961@dhcp-200-207.str.redhat.com>

Am 29.08.2013 13:36, schrieb Kevin Wolf:
> Am 28.08.2013 um 16:55 hat Max Reitz geschrieben:
>> The qcow2_check_refcounts function has been extended to be able to fix
>> OFLAG_COPIED errors and multiple references on refcount blocks.
>>
>> If no corruptions remain after an image repair (and no errors have been
>> encountered), clear the corrupt flag in qcow2_check.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> This would be easier to review if you kept the code changes of the
> actual check (mostly code movement, I guess) and the repair of each type
> of errors in separate patches.
>
>>   block/qcow2-cluster.c  |   4 +-
>>   block/qcow2-refcount.c | 249 ++++++++++++++++++++++++++++++++++++++++++-------
>>   block/qcow2.c          |   6 +-
>>   block/qcow2.h          |   1 +
>>   include/block/block.h  |   1 +
>>   5 files changed, 222 insertions(+), 39 deletions(-)
>> @@ -1240,7 +1243,8 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>   {
>>       BDRVQcowState *s = bs->opaque;
>>       int64_t size, i, highest_cluster;
>> -    int nb_clusters, refcount1, refcount2;
>> +    uint64_t *l2_table = NULL;
>> +    int nb_clusters, refcount1, refcount2, j;
>>       QCowSnapshot *sn;
>>       uint16_t *refcount_table;
>>       int ret;
>> @@ -1305,10 +1309,85 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>               inc_refcounts(bs, res, refcount_table, nb_clusters,
>>                   offset, s->cluster_size);
>>               if (refcount_table[cluster] != 1) {
>> -                fprintf(stderr, "ERROR refcount block %" PRId64
>> +                fprintf(stderr, "%s refcount block %" PRId64
>>                       " refcount=%d\n",
>> +                    fix & BDRV_FIX_ERRORS ? "Repairing" :
>> +                                            "ERROR",
>>                       i, refcount_table[cluster]);
>> -                res->corruptions++;
>> +                if (fix & BDRV_FIX_ERRORS) {
> This is a quite long if block. Probably worth splitting into its own
> function. (The res->corruptions++ part could be in a common fail: part
> then.)
Seems reasonable.

>> +                    int64_t new_offset;
>> +                    void *refcount_block;
>> +
>> +                    /* allocate new refcount block */
>> +                    new_offset = qcow2_alloc_clusters(bs, s->cluster_size);
> How trustworthy is qcow2_alloc_clusters() when we know that our refcount
> information is corrupted? Couldn't we end up accidentally overwriting
> things?
I personally don't really see any other way to do this than to just fail 
without any attempt on repairing. If the refcounts are completely 
broken, I don't see a reasonable way of allocating a free cluster. 
Furthermore, the following pre-write check should at least prevent this 
from overwriting any metadata.

>> +                    if (new_offset < 0) {
>> +                        fprintf(stderr, "Could not allocate new cluster\n");
>> +                        res->corruptions++;
>> +                        continue;
>> +                    }
>> +                    /* fetch current content */
>> +                    ret = qcow2_cache_get(bs, s->refcount_block_cache, offset,
>> +                            &refcount_block);
>> +                    if (ret < 0) {
>> +                        fprintf(stderr, "Could not fetch refcount block\n");
> strerror(-ret) would be useful information in almost all of the error
> cases.
Yes, right.

>> +                        qcow2_free_clusters(bs, new_offset, s->cluster_size,
>> +                                QCOW2_DISCARD_ALWAYS);
>> +                        res->corruptions++;
>> +                        continue;
>> +                    }
>> +                    /* new block has not yet been entered into refcount table,
>> +                     * therefore it is no refcount block yet (regarding this
>> +                     * check) */
>> +                    ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT,
>> +                            new_offset, s->cluster_sectors * BDRV_SECTOR_SIZE);
>> +                    if (ret < 0) {
>> +                        fprintf(stderr, "Could not write refcount block (would "
>> +                                "overlap with existing metadata)\n");
>> +                        /* the image will be marked corrupt here, so don't even
>> +                         * attempt on freeing the cluster */
>> +                        res->corruptions++;
>> +                        goto fail;
>> +                    }
>> +                    /* write to new block */
>> +                    ret = bdrv_write(bs->file, new_offset >> BDRV_SECTOR_BITS,
>> +                            refcount_block, s->cluster_sectors);
>> +                    if (ret < 0) {
>> +                        fprintf(stderr, "Could not write refcount block\n");
>> +                        qcow2_free_clusters(bs, new_offset, s->cluster_size,
>> +                                QCOW2_DISCARD_ALWAYS);
>> +                        res->corruptions++;
>> +                        continue;
>> +                    }
>> +                    /* update refcount table */
>> +                    assert(!(new_offset & (s->cluster_size - 1)));
>> +                    s->refcount_table[i] = new_offset;
>> +                    ret = write_reftable_entry(bs, i);
>> +                    if (ret < 0) {
>> +                        fprintf(stderr, "Could not update refcount table\n");
>> +                        s->refcount_table[i] = offset;
>> +                        qcow2_free_clusters(bs, new_offset, s->cluster_size,
>> +                                QCOW2_DISCARD_ALWAYS);
>> +                        res->corruptions++;
>> +                        continue;
>> +                    }
>> +                    qcow2_cache_put(bs, s->refcount_block_cache,
>> +                            &refcount_block);
>> +                    /* update refcounts */
>> +                    if ((new_offset >> s->cluster_bits) >= nb_clusters) {
>> +                        /* increase refcount_table size if necessary */
>> +                        int old_nb_clusters = nb_clusters;
>> +                        nb_clusters = (new_offset >> s->cluster_bits) + 1;
>> +                        refcount_table = g_realloc(refcount_table,
>> +                                nb_clusters * sizeof(uint16_t));
>> +                        memset(&refcount_table[old_nb_clusters], 0, (nb_clusters
>> +                                - old_nb_clusters) * sizeof(uint16_t));
>> +                    }
>> +                    refcount_table[cluster]--;
>> +                    inc_refcounts(bs, res, refcount_table, nb_clusters,
>> +                            new_offset, s->cluster_size);
> res->corruptions_fixed++?
Oh, yes, I forgot that.

>> +                } else {
>> +                    res->corruptions++;
>> +                }
>>               }
>>           }
>>       }
>> @@ -1364,10 +1443,108 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>>           }
>>       }
>>   
>> +    l2_table = g_malloc(s->l2_size * sizeof(uint64_t));
> qemu_blockalign() is better than g_malloc() because it avoids using a
> bounce buffer for O_DIRECT images.
Ah, okay. I'll include that in my other series as well.

>> +
>> +    /* check OFLAG_COPIED */
>> +    for (i = 0; i < s->l1_size; i++) {
>> +        uint64_t l1_entry = s->l1_table[i];
>> +        uint64_t l2_offset = l1_entry & L1E_OFFSET_MASK;
>> +        bool l2_dirty = false;
>> +        int refcount;
>> +
>> +        if (!l2_offset) {
>> +            continue;
>> +        }
>> +
>> +        refcount = get_refcount(bs, l2_offset >> s->cluster_bits);
>> +        if (refcount < 0) {
>> +            /* don't print message nor increment check_errors, since the above
>> +             * loop will have done this already */
>> +            continue;
>> +        }
>> +        if ((refcount == 1) != ((l1_entry & QCOW_OFLAG_COPIED) != 0)) {
>> +            fprintf(stderr, "%s OFLAG_COPIED L2 cluster: l1_entry=%" PRIx64
>> +                    " refcount=%d\n",
>> +                    fix & BDRV_FIX_ERRORS ? "Repairing" :
>> +                                            "ERROR",
>> +                    l1_entry, refcount);
>> +            if (fix & BDRV_FIX_ERRORS) {
>> +                s->l1_table[i] = refcount == 1
>> +                               ? l1_entry |  QCOW_OFLAG_COPIED
>> +                               : l1_entry & ~QCOW_OFLAG_COPIED;
>> +                ret = qcow2_write_l1_entry(bs, i);
>> +                if (ret < 0) {
>> +                    res->check_errors++;
>> +                    goto fail;
>> +                }
> else res->corruptions_fixed++?
>
>> +            } else {
>> +                res->corruptions++;
>> +            }
>> +        }
>> +
>> +        ret = bdrv_pread(bs->file, l2_offset, l2_table,
>> +                s->l2_size * sizeof(uint64_t));
>> +        if (ret != s->l2_size * sizeof(uint64_t)) {
>> +            fprintf(stderr, "ERROR: Could not read L2 table\n");
>> +            res->check_errors++;
>> +            if (ret >= 0) {
>> +                ret = -EIO;
>> +            }
> Doesn't happen. You can just check ret < 0 instead of ret != ... in the
> first place, bdrv_pread() never returns short reads.
Okay, I think I just saw it that way in some other code.

>> +            goto fail;
>> +        }
>> +
>> +        for (j = 0; j < s->l2_size; j++) {
>> +            uint64_t l2_entry = be64_to_cpu(l2_table[j]);
>> +            uint64_t data_offset;
>> +
>> +            if (qcow2_get_cluster_type(l2_entry) != QCOW2_CLUSTER_NORMAL) {
>> +                continue;
>> +            }
>> +
>> +            data_offset = l2_entry & L2E_OFFSET_MASK;
>> +
>> +            refcount = get_refcount(bs, data_offset >> s->cluster_bits);
>> +            if (refcount < 0) {
>> +                /* don't print message nor increment check_errors */
> Why?
I didn't want to duplicate the whole comment from above, but maybe it 
makes sense if it's not obvious:

/* don't print message nor increment check_errors, since the above
  * loop will have done this already */

>> +                continue;
>> +            }
>> +            if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
>> +                fprintf(stderr, "%s OFLAG_COPIED data cluster: l2_entry=%"
>> +                        PRIx64 " refcount=%d\n",
>> +                        fix & BDRV_FIX_ERRORS ? "Repairing" :
>> +                                                "ERROR",
>> +                        l2_entry, refcount);
>> +                if (fix & BDRV_FIX_ERRORS) {
>> +                    l2_table[j] = cpu_to_be64(refcount == 1
>> +                                ? l2_entry |  QCOW_OFLAG_COPIED
>> +                                : l2_entry & ~QCOW_OFLAG_COPIED);
>> +                    l2_dirty = true;
> res->corruptions_fixed++
>
>> +                } else {
>> +                    res->corruptions++;
>> +                }
>> +            }
>> +        }
>> +
>> +        if (l2_dirty) {
>> +            ret = bdrv_pwrite(bs->file, l2_offset, l2_table,
>> +                    s->l2_size * sizeof(uint64_t));
>> +            if (ret != s->l2_size * sizeof(uint64_t)) {
>> +                fprintf(stderr, "ERROR: Could not write L2 table\n");
>> +                res->check_errors++;
>> +                if (ret >= 0) {
>> +                    ret = -EIO;
>> +                }
> bdrv_pwrite() also doesn't return short writes.
>
>> +                goto fail;
>> +            }
>> +        }
>> +    }
>> +
>> +
>>       res->image_end_offset = (highest_cluster + 1) * s->cluster_size;
>>       ret = 0;
>>   
>>   fail:
>> +    g_free(l2_table);
>>       g_free(refcount_table);
>>   
>>       return ret;
> Kevin
Max

next prev parent reply	other threads:[~2013-08-29 12:10 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-08-28 14:55 [Qemu-devel] [PATCH v2 0/5] qcow2: Add metadata overlap checks Max Reitz
2013-08-28 14:55 ` [Qemu-devel] [PATCH v2 1/5] qcow2: Add corrupt bit Max Reitz
2013-08-29  8:23   ` Kevin Wolf
2013-08-29  8:27     ` Max Reitz
2013-08-29  8:57       ` Kevin Wolf
2013-08-28 14:55 ` [Qemu-devel] [PATCH v2 2/5] qcow2: Metadata overlap checks Max Reitz
2013-08-29  8:51   ` Kevin Wolf
2013-08-29  8:57     ` Max Reitz
2013-08-28 14:55 ` [Qemu-devel] [PATCH v2 3/5] qcow2: Employ metadata " Max Reitz
2013-08-29  9:18   ` Kevin Wolf
2013-08-29  9:20     ` Max Reitz
2013-08-28 14:55 ` [Qemu-devel] [PATCH v2 4/5] qcow2: More complete consistency check Max Reitz
2013-08-29 11:36   ` Kevin Wolf
2013-08-29 12:09     ` Max Reitz [this message]
2013-08-28 14:55 ` [Qemu-devel] [PATCH v2 5/5] qemu-iotests: Overlapping cluster allocations Max Reitz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=521F3A07.2000002@redhat.com \
    --to=mreitz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.