Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Max Reitz <mreitz@redhat.com>
Cc: qemu-devel@nongnu.org, "Stefan Hajnoczi" <stefanha@redhat.com>,
	"Benoît Canet" <benoit.canet@nodalink.com>
Subject: Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs
Date: Tue, 21 Oct 2014 09:52:30 +0200	[thread overview]
Message-ID: <20141021075230.GB4409@noname.redhat.com> (raw)
In-Reply-To: <1413815733-22829-8-git-send-email-mreitz@redhat.com>

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> If a referenced cluster has a refcount of 0, increasing its refcount may
> result in clusters being allocated for the refcount structures. This may
> overwrite the referenced cluster, therefore we cannot simply increase
> the refcount then.
> 
> In such cases, we can either try to replicate all the refcount
> operations solely for the check operation, basing the allocations on the
> in-memory refcount table; or we can simply rebuild the whole refcount
> structure based on the in-memory refcount table. Since the latter will
> be much easier, do that.
> 
> To prepare for this, introduce a "rebuild" boolean which should be set
> to true whenever a fix is rather dangerous or too complicated using the
> current refcount structures. Another example for this is refcount blocks
> being referenced more than once.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>

> @@ -1557,6 +1442,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>              fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
>                  "cluster aligned; refcount table entry corrupted\n", i);
>              res->corruptions++;
> +            *rebuild = true;
>              continue;
>          }
>  
> @@ -1612,6 +1498,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>  
>  resize_fail:
>                  res->corruptions++;
> +                *rebuild = true;
>                  fprintf(stderr, "ERROR could not resize image: %s\n",
>                          strerror(-ret));
>              } else {

Increasing res->corruptions in this patch looks right because you don't
actually rebuild anything yet. However, at the end of the series this
code still looks the same - shouldn't we somehow convert those the
corruptions caused by wrong refcounts into res->corruptions_fixed?

Hm... It seems that you do this by storing intermediate results in
qcow2_check_refcounts(), which assumes that compare_refcounts() finds
only refcount problems, and no other place can find any. This may be
true, but wouldn't it be more elegant to have a separate counter for
corruptions that will be fixed with a rebuild? We can use some
Qcow2CheckResult instead of BdrvCheckResult internally and add more
fields there.

Also, I don't think all "ERROR" messages correctly say "Repairing" again
at the end of the series.

> @@ -1624,40 +1511,10 @@ resize_fail:
>              inc_refcounts(bs, res, *refcount_table, *nb_clusters,
>                  offset, s->cluster_size);
>              if ((*refcount_table)[cluster] != 1) {
> -                fprintf(stderr, "%s refcount block %" PRId64
> -                    " refcount=%d\n",
> -                    fix & BDRV_FIX_ERRORS ? "Repairing" :
> -                                            "ERROR",
> -                    i, (*refcount_table)[cluster]);
> -
> -                if (fix & BDRV_FIX_ERRORS) {
> -                    int64_t new_offset;
> -
> -                    new_offset = realloc_refcount_block(bs, i, offset);
> -                    if (new_offset < 0) {
> -                        res->corruptions++;
> -                        continue;
> -                    }
> -
> -                    /* update refcounts */
> -                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
> -                        /* increase refcount_table size if necessary */
> -                        int old_nb_clusters = *nb_clusters;
> -                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
> -                        *refcount_table = g_renew(uint16_t, *refcount_table,
> -                                                  *nb_clusters);
> -                        memset(&(*refcount_table)[old_nb_clusters], 0,
> -                               (*nb_clusters - old_nb_clusters) *
> -                               sizeof(uint16_t));
> -                    }
> -                    (*refcount_table)[cluster]--;
> -                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
> -                            new_offset, s->cluster_size);
> -
> -                    res->corruptions_fixed++;
> -                } else {
> -                    res->corruptions++;
> -                }
> +                fprintf(stderr, "ERROR refcount block %" PRId64
> +                        " refcount=%d\n", i, (*refcount_table)[cluster]);
> +                res->corruptions++;
> +                *rebuild = true;
>              }
>          }
>      }
> @@ -1669,8 +1526,8 @@ resize_fail:
>   * Calculates an in-memory refcount table.
>   */
>  static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
> -                               BdrvCheckMode fix, uint16_t **refcount_table,
> -                               int64_t *nb_clusters)
> +                               BdrvCheckMode fix, bool *rebuild,
> +                               uint16_t **refcount_table, int64_t *nb_clusters)
>  {
>      BDRVQcowState *s = bs->opaque;
>      int64_t i;
> @@ -1713,7 +1570,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>          s->refcount_table_offset,
>          s->refcount_table_size * sizeof(uint64_t));
>  
> -    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
> +    return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_clusters);
>  }
>  
>  /*
> @@ -1721,7 +1578,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>   * refcount as reported by the refcount structures on-disk.
>   */
>  static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
> -                              BdrvCheckMode fix, int64_t *highest_cluster,
> +                              BdrvCheckMode fix, bool *rebuild,
> +                              int64_t *highest_cluster,
>                                uint16_t *refcount_table, int64_t nb_clusters)
>  {
>      BDRVQcowState *s = bs->opaque;
> @@ -1748,10 +1606,15 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>              int *num_fixed = NULL;
>              if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
>                  num_fixed = &res->leaks_fixed;
> -            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
> +            } else if (refcount1 > 0 && refcount1 < refcount2 &&
> +                       (fix & BDRV_FIX_ERRORS)) {
>                  num_fixed = &res->corruptions_fixed;
>              }
>  
> +            if (refcount1 == 0) {
> +                *rebuild = true;
> +            }

Why isn't this just another else if branch? Possibly even as the first
or second condition, so that you don't have to add 'refcount1 > 0' for
the other branch.

> +
>              fprintf(stderr, "%s cluster %" PRId64 " refcount=%d reference=%d\n",
>                     num_fixed != NULL     ? "Repairing" :
>                     refcount1 < refcount2 ? "ERROR" :
> @@ -1790,6 +1653,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>      BDRVQcowState *s = bs->opaque;
>      int64_t size, highest_cluster, nb_clusters;
>      uint16_t *refcount_table = NULL;
> +    bool rebuild = false;
>      int ret;
>  
>      size = bdrv_getlength(bs->file);
> @@ -1807,14 +1671,19 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>      res->bfi.total_clusters =
>          size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
>  
> -    ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
> +    ret = calculate_refcounts(bs, res, fix, &rebuild, &refcount_table,
> +                              &nb_clusters);
>      if (ret < 0) {
>          goto fail;
>      }
>  
> -    compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
> +    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
>                        nb_clusters);
>  
> +    if (rebuild) {
> +        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
> +    }

Should this increate res->check_errors? (It doesn't survive until the
end of the series anyway, but more places seem to be added later that
print an error message without increasing the error count.)

Kevin

next prev parent reply	other threads:[~2014-10-21  7:52 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 01/11] qcow2: Calculate refcount block entry count Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 02/11] qcow2: Fix leaks in dirty images Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 03/11] qcow2: Split qcow2_check_refcounts() Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 04/11] qcow2: Pull check_refblocks() up Max Reitz
2014-10-20 15:07   ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 05/11] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
2014-10-20 15:09   ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 06/11] qcow2: Fix refcount blocks beyond image end Max Reitz
2014-10-20 16:44   ` Kevin Wolf
2014-10-21  7:14     ` Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs Max Reitz
2014-10-21  7:52   ` Kevin Wolf [this message]
2014-10-21 10:10     ` Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 08/11] qcow2: Rebuild refcount structure during check Max Reitz
2014-10-21  9:31   ` Kevin Wolf
2014-10-21  9:52     ` Max Reitz
2014-10-21 10:12       ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild Max Reitz
2014-10-21  9:59   ` Kevin Wolf
2014-10-21 10:16     ` Max Reitz
2014-10-21 14:55       ` Max Reitz
2014-10-21 15:11         ` Kevin Wolf
2014-10-21 15:17           ` Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 10/11] iotests: Fix test outputs Max Reitz
2014-10-21 13:42   ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 11/11] iotests: Add test for potentially damaging repairs Max Reitz
2014-10-21 14:12   ` Kevin Wolf
2014-10-21 14:20     ` Max Reitz
2014-10-21 15:16       ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141021075230.GB4409@noname.redhat.com \
    --to=kwolf@redhat.com \
    --cc=benoit.canet@nodalink.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).