From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v2 4/5] qcow2: More complete consistency check
Date: Thu, 29 Aug 2013 14:09:43 +0200 [thread overview]
Message-ID: <521F3A07.2000002@redhat.com> (raw)
In-Reply-To: <20130829113654.GG2961@dhcp-200-207.str.redhat.com>
Am 29.08.2013 13:36, schrieb Kevin Wolf:
> Am 28.08.2013 um 16:55 hat Max Reitz geschrieben:
>> The qcow2_check_refcounts function has been extended to be able to fix
>> OFLAG_COPIED errors and multiple references on refcount blocks.
>>
>> If no corruptions remain after an image repair (and no errors have been
>> encountered), clear the corrupt flag in qcow2_check.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
> This would be easier to review if you kept the code changes of the
> actual check (mostly code movement, I guess) and the repair of each type
> of errors in separate patches.
>
>> block/qcow2-cluster.c | 4 +-
>> block/qcow2-refcount.c | 249 ++++++++++++++++++++++++++++++++++++++++++-------
>> block/qcow2.c | 6 +-
>> block/qcow2.h | 1 +
>> include/block/block.h | 1 +
>> 5 files changed, 222 insertions(+), 39 deletions(-)
>> @@ -1240,7 +1243,8 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>> {
>> BDRVQcowState *s = bs->opaque;
>> int64_t size, i, highest_cluster;
>> - int nb_clusters, refcount1, refcount2;
>> + uint64_t *l2_table = NULL;
>> + int nb_clusters, refcount1, refcount2, j;
>> QCowSnapshot *sn;
>> uint16_t *refcount_table;
>> int ret;
>> @@ -1305,10 +1309,85 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>> inc_refcounts(bs, res, refcount_table, nb_clusters,
>> offset, s->cluster_size);
>> if (refcount_table[cluster] != 1) {
>> - fprintf(stderr, "ERROR refcount block %" PRId64
>> + fprintf(stderr, "%s refcount block %" PRId64
>> " refcount=%d\n",
>> + fix & BDRV_FIX_ERRORS ? "Repairing" :
>> + "ERROR",
>> i, refcount_table[cluster]);
>> - res->corruptions++;
>> + if (fix & BDRV_FIX_ERRORS) {
> This is a quite long if block. Probably worth splitting into its own
> function. (The res->corruptions++ part could be in a common fail: part
> then.)
Seems reasonable.
>> + int64_t new_offset;
>> + void *refcount_block;
>> +
>> + /* allocate new refcount block */
>> + new_offset = qcow2_alloc_clusters(bs, s->cluster_size);
> How trustworthy is qcow2_alloc_clusters() when we know that our refcount
> information is corrupted? Couldn't we end up accidentally overwriting
> things?
I personally don't really see any other way to do this than to just fail
without any attempt on repairing. If the refcounts are completely
broken, I don't see a reasonable way of allocating a free cluster.
Furthermore, the following pre-write check should at least prevent this
from overwriting any metadata.
>> + if (new_offset < 0) {
>> + fprintf(stderr, "Could not allocate new cluster\n");
>> + res->corruptions++;
>> + continue;
>> + }
>> + /* fetch current content */
>> + ret = qcow2_cache_get(bs, s->refcount_block_cache, offset,
>> + &refcount_block);
>> + if (ret < 0) {
>> + fprintf(stderr, "Could not fetch refcount block\n");
> strerror(-ret) would be useful information in almost all of the error
> cases.
Yes, right.
>> + qcow2_free_clusters(bs, new_offset, s->cluster_size,
>> + QCOW2_DISCARD_ALWAYS);
>> + res->corruptions++;
>> + continue;
>> + }
>> + /* new block has not yet been entered into refcount table,
>> + * therefore it is no refcount block yet (regarding this
>> + * check) */
>> + ret = qcow2_pre_write_overlap_check(bs, QCOW2_OL_DEFAULT,
>> + new_offset, s->cluster_sectors * BDRV_SECTOR_SIZE);
>> + if (ret < 0) {
>> + fprintf(stderr, "Could not write refcount block (would "
>> + "overlap with existing metadata)\n");
>> + /* the image will be marked corrupt here, so don't even
>> + * attempt on freeing the cluster */
>> + res->corruptions++;
>> + goto fail;
>> + }
>> + /* write to new block */
>> + ret = bdrv_write(bs->file, new_offset >> BDRV_SECTOR_BITS,
>> + refcount_block, s->cluster_sectors);
>> + if (ret < 0) {
>> + fprintf(stderr, "Could not write refcount block\n");
>> + qcow2_free_clusters(bs, new_offset, s->cluster_size,
>> + QCOW2_DISCARD_ALWAYS);
>> + res->corruptions++;
>> + continue;
>> + }
>> + /* update refcount table */
>> + assert(!(new_offset & (s->cluster_size - 1)));
>> + s->refcount_table[i] = new_offset;
>> + ret = write_reftable_entry(bs, i);
>> + if (ret < 0) {
>> + fprintf(stderr, "Could not update refcount table\n");
>> + s->refcount_table[i] = offset;
>> + qcow2_free_clusters(bs, new_offset, s->cluster_size,
>> + QCOW2_DISCARD_ALWAYS);
>> + res->corruptions++;
>> + continue;
>> + }
>> + qcow2_cache_put(bs, s->refcount_block_cache,
>> + &refcount_block);
>> + /* update refcounts */
>> + if ((new_offset >> s->cluster_bits) >= nb_clusters) {
>> + /* increase refcount_table size if necessary */
>> + int old_nb_clusters = nb_clusters;
>> + nb_clusters = (new_offset >> s->cluster_bits) + 1;
>> + refcount_table = g_realloc(refcount_table,
>> + nb_clusters * sizeof(uint16_t));
>> + memset(&refcount_table[old_nb_clusters], 0, (nb_clusters
>> + - old_nb_clusters) * sizeof(uint16_t));
>> + }
>> + refcount_table[cluster]--;
>> + inc_refcounts(bs, res, refcount_table, nb_clusters,
>> + new_offset, s->cluster_size);
> res->corruptions_fixed++?
Oh, yes, I forgot that.
>> + } else {
>> + res->corruptions++;
>> + }
>> }
>> }
>> }
>> @@ -1364,10 +1443,108 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>> }
>> }
>>
>> + l2_table = g_malloc(s->l2_size * sizeof(uint64_t));
> qemu_blockalign() is better than g_malloc() because it avoids using a
> bounce buffer for O_DIRECT images.
Ah, okay. I'll include that in my other series as well.
>> +
>> + /* check OFLAG_COPIED */
>> + for (i = 0; i < s->l1_size; i++) {
>> + uint64_t l1_entry = s->l1_table[i];
>> + uint64_t l2_offset = l1_entry & L1E_OFFSET_MASK;
>> + bool l2_dirty = false;
>> + int refcount;
>> +
>> + if (!l2_offset) {
>> + continue;
>> + }
>> +
>> + refcount = get_refcount(bs, l2_offset >> s->cluster_bits);
>> + if (refcount < 0) {
>> + /* don't print message nor increment check_errors, since the above
>> + * loop will have done this already */
>> + continue;
>> + }
>> + if ((refcount == 1) != ((l1_entry & QCOW_OFLAG_COPIED) != 0)) {
>> + fprintf(stderr, "%s OFLAG_COPIED L2 cluster: l1_entry=%" PRIx64
>> + " refcount=%d\n",
>> + fix & BDRV_FIX_ERRORS ? "Repairing" :
>> + "ERROR",
>> + l1_entry, refcount);
>> + if (fix & BDRV_FIX_ERRORS) {
>> + s->l1_table[i] = refcount == 1
>> + ? l1_entry | QCOW_OFLAG_COPIED
>> + : l1_entry & ~QCOW_OFLAG_COPIED;
>> + ret = qcow2_write_l1_entry(bs, i);
>> + if (ret < 0) {
>> + res->check_errors++;
>> + goto fail;
>> + }
> else res->corruptions_fixed++?
>
>> + } else {
>> + res->corruptions++;
>> + }
>> + }
>> +
>> + ret = bdrv_pread(bs->file, l2_offset, l2_table,
>> + s->l2_size * sizeof(uint64_t));
>> + if (ret != s->l2_size * sizeof(uint64_t)) {
>> + fprintf(stderr, "ERROR: Could not read L2 table\n");
>> + res->check_errors++;
>> + if (ret >= 0) {
>> + ret = -EIO;
>> + }
> Doesn't happen. You can just check ret < 0 instead of ret != ... in the
> first place, bdrv_pread() never returns short reads.
Okay, I think I just saw it that way in some other code.
>> + goto fail;
>> + }
>> +
>> + for (j = 0; j < s->l2_size; j++) {
>> + uint64_t l2_entry = be64_to_cpu(l2_table[j]);
>> + uint64_t data_offset;
>> +
>> + if (qcow2_get_cluster_type(l2_entry) != QCOW2_CLUSTER_NORMAL) {
>> + continue;
>> + }
>> +
>> + data_offset = l2_entry & L2E_OFFSET_MASK;
>> +
>> + refcount = get_refcount(bs, data_offset >> s->cluster_bits);
>> + if (refcount < 0) {
>> + /* don't print message nor increment check_errors */
> Why?
I didn't want to duplicate the whole comment from above, but maybe it
makes sense if it's not obvious:
/* don't print message nor increment check_errors, since the above
* loop will have done this already */
>> + continue;
>> + }
>> + if ((refcount == 1) != ((l2_entry & QCOW_OFLAG_COPIED) != 0)) {
>> + fprintf(stderr, "%s OFLAG_COPIED data cluster: l2_entry=%"
>> + PRIx64 " refcount=%d\n",
>> + fix & BDRV_FIX_ERRORS ? "Repairing" :
>> + "ERROR",
>> + l2_entry, refcount);
>> + if (fix & BDRV_FIX_ERRORS) {
>> + l2_table[j] = cpu_to_be64(refcount == 1
>> + ? l2_entry | QCOW_OFLAG_COPIED
>> + : l2_entry & ~QCOW_OFLAG_COPIED);
>> + l2_dirty = true;
> res->corruptions_fixed++
>
>> + } else {
>> + res->corruptions++;
>> + }
>> + }
>> + }
>> +
>> + if (l2_dirty) {
>> + ret = bdrv_pwrite(bs->file, l2_offset, l2_table,
>> + s->l2_size * sizeof(uint64_t));
>> + if (ret != s->l2_size * sizeof(uint64_t)) {
>> + fprintf(stderr, "ERROR: Could not write L2 table\n");
>> + res->check_errors++;
>> + if (ret >= 0) {
>> + ret = -EIO;
>> + }
> bdrv_pwrite() also doesn't return short writes.
>
>> + goto fail;
>> + }
>> + }
>> + }
>> +
>> +
>> res->image_end_offset = (highest_cluster + 1) * s->cluster_size;
>> ret = 0;
>>
>> fail:
>> + g_free(l2_table);
>> g_free(refcount_table);
>>
>> return ret;
> Kevin
Max
next prev parent reply other threads:[~2013-08-29 12:10 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-08-28 14:55 [Qemu-devel] [PATCH v2 0/5] qcow2: Add metadata overlap checks Max Reitz
2013-08-28 14:55 ` [Qemu-devel] [PATCH v2 1/5] qcow2: Add corrupt bit Max Reitz
2013-08-29 8:23 ` Kevin Wolf
2013-08-29 8:27 ` Max Reitz
2013-08-29 8:57 ` Kevin Wolf
2013-08-28 14:55 ` [Qemu-devel] [PATCH v2 2/5] qcow2: Metadata overlap checks Max Reitz
2013-08-29 8:51 ` Kevin Wolf
2013-08-29 8:57 ` Max Reitz
2013-08-28 14:55 ` [Qemu-devel] [PATCH v2 3/5] qcow2: Employ metadata " Max Reitz
2013-08-29 9:18 ` Kevin Wolf
2013-08-29 9:20 ` Max Reitz
2013-08-28 14:55 ` [Qemu-devel] [PATCH v2 4/5] qcow2: More complete consistency check Max Reitz
2013-08-29 11:36 ` Kevin Wolf
2013-08-29 12:09 ` Max Reitz [this message]
2013-08-28 14:55 ` [Qemu-devel] [PATCH v2 5/5] qemu-iotests: Overlapping cluster allocations Max Reitz
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=521F3A07.2000002@redhat.com \
--to=mreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).