qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: Max Reitz <mreitz@redhat.com>
Cc: qemu-devel@nongnu.org, "Stefan Hajnoczi" <stefanha@redhat.com>,
	"Benoît Canet" <benoit.canet@nodalink.com>
Subject: Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs
Date: Tue, 21 Oct 2014 09:52:30 +0200	[thread overview]
Message-ID: <20141021075230.GB4409@noname.redhat.com> (raw)
In-Reply-To: <1413815733-22829-8-git-send-email-mreitz@redhat.com>

Am 20.10.2014 um 16:35 hat Max Reitz geschrieben:
> If a referenced cluster has a refcount of 0, increasing its refcount may
> result in clusters being allocated for the refcount structures. This may
> overwrite the referenced cluster, therefore we cannot simply increase
> the refcount then.
> 
> In such cases, we can either try to replicate all the refcount
> operations solely for the check operation, basing the allocations on the
> in-memory refcount table; or we can simply rebuild the whole refcount
> structure based on the in-memory refcount table. Since the latter will
> be much easier, do that.
> 
> To prepare for this, introduce a "rebuild" boolean which should be set
> to true whenever a fix is rather dangerous or too complicated using the
> current refcount structures. Another example for this is refcount blocks
> being referenced more than once.
> 
> Signed-off-by: Max Reitz <mreitz@redhat.com>
> Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>

> @@ -1557,6 +1442,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>              fprintf(stderr, "ERROR refcount block %" PRId64 " is not "
>                  "cluster aligned; refcount table entry corrupted\n", i);
>              res->corruptions++;
> +            *rebuild = true;
>              continue;
>          }
>  
> @@ -1612,6 +1498,7 @@ static int check_refblocks(BlockDriverState *bs, BdrvCheckResult *res,
>  
>  resize_fail:
>                  res->corruptions++;
> +                *rebuild = true;
>                  fprintf(stderr, "ERROR could not resize image: %s\n",
>                          strerror(-ret));
>              } else {

Increasing res->corruptions in this patch looks right because you don't
actually rebuild anything yet. However, at the end of the series this
code still looks the same - shouldn't we somehow convert those the
corruptions caused by wrong refcounts into res->corruptions_fixed?

Hm... It seems that you do this by storing intermediate results in
qcow2_check_refcounts(), which assumes that compare_refcounts() finds
only refcount problems, and no other place can find any. This may be
true, but wouldn't it be more elegant to have a separate counter for
corruptions that will be fixed with a rebuild? We can use some
Qcow2CheckResult instead of BdrvCheckResult internally and add more
fields there.

Also, I don't think all "ERROR" messages correctly say "Repairing" again
at the end of the series.

> @@ -1624,40 +1511,10 @@ resize_fail:
>              inc_refcounts(bs, res, *refcount_table, *nb_clusters,
>                  offset, s->cluster_size);
>              if ((*refcount_table)[cluster] != 1) {
> -                fprintf(stderr, "%s refcount block %" PRId64
> -                    " refcount=%d\n",
> -                    fix & BDRV_FIX_ERRORS ? "Repairing" :
> -                                            "ERROR",
> -                    i, (*refcount_table)[cluster]);
> -
> -                if (fix & BDRV_FIX_ERRORS) {
> -                    int64_t new_offset;
> -
> -                    new_offset = realloc_refcount_block(bs, i, offset);
> -                    if (new_offset < 0) {
> -                        res->corruptions++;
> -                        continue;
> -                    }
> -
> -                    /* update refcounts */
> -                    if ((new_offset >> s->cluster_bits) >= *nb_clusters) {
> -                        /* increase refcount_table size if necessary */
> -                        int old_nb_clusters = *nb_clusters;
> -                        *nb_clusters = (new_offset >> s->cluster_bits) + 1;
> -                        *refcount_table = g_renew(uint16_t, *refcount_table,
> -                                                  *nb_clusters);
> -                        memset(&(*refcount_table)[old_nb_clusters], 0,
> -                               (*nb_clusters - old_nb_clusters) *
> -                               sizeof(uint16_t));
> -                    }
> -                    (*refcount_table)[cluster]--;
> -                    inc_refcounts(bs, res, *refcount_table, *nb_clusters,
> -                            new_offset, s->cluster_size);
> -
> -                    res->corruptions_fixed++;
> -                } else {
> -                    res->corruptions++;
> -                }
> +                fprintf(stderr, "ERROR refcount block %" PRId64
> +                        " refcount=%d\n", i, (*refcount_table)[cluster]);
> +                res->corruptions++;
> +                *rebuild = true;
>              }
>          }
>      }
> @@ -1669,8 +1526,8 @@ resize_fail:
>   * Calculates an in-memory refcount table.
>   */
>  static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
> -                               BdrvCheckMode fix, uint16_t **refcount_table,
> -                               int64_t *nb_clusters)
> +                               BdrvCheckMode fix, bool *rebuild,
> +                               uint16_t **refcount_table, int64_t *nb_clusters)
>  {
>      BDRVQcowState *s = bs->opaque;
>      int64_t i;
> @@ -1713,7 +1570,7 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>          s->refcount_table_offset,
>          s->refcount_table_size * sizeof(uint64_t));
>  
> -    return check_refblocks(bs, res, fix, refcount_table, nb_clusters);
> +    return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_clusters);
>  }
>  
>  /*
> @@ -1721,7 +1578,8 @@ static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>   * refcount as reported by the refcount structures on-disk.
>   */
>  static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
> -                              BdrvCheckMode fix, int64_t *highest_cluster,
> +                              BdrvCheckMode fix, bool *rebuild,
> +                              int64_t *highest_cluster,
>                                uint16_t *refcount_table, int64_t nb_clusters)
>  {
>      BDRVQcowState *s = bs->opaque;
> @@ -1748,10 +1606,15 @@ static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>              int *num_fixed = NULL;
>              if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) {
>                  num_fixed = &res->leaks_fixed;
> -            } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS)) {
> +            } else if (refcount1 > 0 && refcount1 < refcount2 &&
> +                       (fix & BDRV_FIX_ERRORS)) {
>                  num_fixed = &res->corruptions_fixed;
>              }
>  
> +            if (refcount1 == 0) {
> +                *rebuild = true;
> +            }

Why isn't this just another else if branch? Possibly even as the first
or second condition, so that you don't have to add 'refcount1 > 0' for
the other branch.

> +
>              fprintf(stderr, "%s cluster %" PRId64 " refcount=%d reference=%d\n",
>                     num_fixed != NULL     ? "Repairing" :
>                     refcount1 < refcount2 ? "ERROR" :
> @@ -1790,6 +1653,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>      BDRVQcowState *s = bs->opaque;
>      int64_t size, highest_cluster, nb_clusters;
>      uint16_t *refcount_table = NULL;
> +    bool rebuild = false;
>      int ret;
>  
>      size = bdrv_getlength(bs->file);
> @@ -1807,14 +1671,19 @@ int qcow2_check_refcounts(BlockDriverState *bs, BdrvCheckResult *res,
>      res->bfi.total_clusters =
>          size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE);
>  
> -    ret = calculate_refcounts(bs, res, fix, &refcount_table, &nb_clusters);
> +    ret = calculate_refcounts(bs, res, fix, &rebuild, &refcount_table,
> +                              &nb_clusters);
>      if (ret < 0) {
>          goto fail;
>      }
>  
> -    compare_refcounts(bs, res, fix, &highest_cluster, refcount_table,
> +    compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcount_table,
>                        nb_clusters);
>  
> +    if (rebuild) {
> +        fprintf(stderr, "ERROR need to rebuild refcount structures\n");
> +    }

Should this increate res->check_errors? (It doesn't survive until the
end of the series anyway, but more places seem to be added later that
print an error message without increasing the error count.)

Kevin

  reply	other threads:[~2014-10-21  7:52 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-10-20 14:35 [Qemu-devel] [PATCH v6 00/11] qcow2: Fix image repairing Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 01/11] qcow2: Calculate refcount block entry count Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 02/11] qcow2: Fix leaks in dirty images Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 03/11] qcow2: Split qcow2_check_refcounts() Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 04/11] qcow2: Pull check_refblocks() up Max Reitz
2014-10-20 15:07   ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 05/11] qcow2: Reuse refcount table in calculate_refcounts() Max Reitz
2014-10-20 15:09   ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 06/11] qcow2: Fix refcount blocks beyond image end Max Reitz
2014-10-20 16:44   ` Kevin Wolf
2014-10-21  7:14     ` Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs Max Reitz
2014-10-21  7:52   ` Kevin Wolf [this message]
2014-10-21 10:10     ` Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 08/11] qcow2: Rebuild refcount structure during check Max Reitz
2014-10-21  9:31   ` Kevin Wolf
2014-10-21  9:52     ` Max Reitz
2014-10-21 10:12       ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 09/11] qcow2: Clean up after refcount rebuild Max Reitz
2014-10-21  9:59   ` Kevin Wolf
2014-10-21 10:16     ` Max Reitz
2014-10-21 14:55       ` Max Reitz
2014-10-21 15:11         ` Kevin Wolf
2014-10-21 15:17           ` Max Reitz
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 10/11] iotests: Fix test outputs Max Reitz
2014-10-21 13:42   ` Kevin Wolf
2014-10-20 14:35 ` [Qemu-devel] [PATCH v6 11/11] iotests: Add test for potentially damaging repairs Max Reitz
2014-10-21 14:12   ` Kevin Wolf
2014-10-21 14:20     ` Max Reitz
2014-10-21 15:16       ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20141021075230.GB4409@noname.redhat.com \
    --to=kwolf@redhat.com \
    --cc=benoit.canet@nodalink.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).