From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:43749) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XgUFH-0003lb-7f for qemu-devel@nongnu.org; Tue, 21 Oct 2014 03:52:48 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XgUFC-0003Yu-2c for qemu-devel@nongnu.org; Tue, 21 Oct 2014 03:52:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:26885) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XgUFB-0003Ym-Qf for qemu-devel@nongnu.org; Tue, 21 Oct 2014 03:52:37 -0400 Date: Tue, 21 Oct 2014 09:52:30 +0200 From: Kevin Wolf Message-ID: <20141021075230.GB4409@noname.redhat.com> References: <1413815733-22829-1-git-send-email-mreitz@redhat.com> <1413815733-22829-8-git-send-email-mreitz@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <1413815733-22829-8-git-send-email-mreitz@redhat.com> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz Cc: qemu-devel@nongnu.org, Stefan Hajnoczi , =?iso-8859-1?Q?Beno=EEt?= Canet Am 20.10.2014 um 16:35 hat Max Reitz geschrieben: > If a referenced cluster has a refcount of 0, increasing its refcount ma= y > result in clusters being allocated for the refcount structures. This ma= y > overwrite the referenced cluster, therefore we cannot simply increase > the refcount then. >=20 > In such cases, we can either try to replicate all the refcount > operations solely for the check operation, basing the allocations on th= e > in-memory refcount table; or we can simply rebuild the whole refcount > structure based on the in-memory refcount table. Since the latter will > be much easier, do that. >=20 > To prepare for this, introduce a "rebuild" boolean which should be set > to true whenever a fix is rather dangerous or too complicated using the > current refcount structures. Another example for this is refcount block= s > being referenced more than once. >=20 > Signed-off-by: Max Reitz > Reviewed-by: Beno=EEt Canet > @@ -1557,6 +1442,7 @@ static int check_refblocks(BlockDriverState *bs, = BdrvCheckResult *res, > fprintf(stderr, "ERROR refcount block %" PRId64 " is not " > "cluster aligned; refcount table entry corrupted\n", i= ); > res->corruptions++; > + *rebuild =3D true; > continue; > } > =20 > @@ -1612,6 +1498,7 @@ static int check_refblocks(BlockDriverState *bs, = BdrvCheckResult *res, > =20 > resize_fail: > res->corruptions++; > + *rebuild =3D true; > fprintf(stderr, "ERROR could not resize image: %s\n", > strerror(-ret)); > } else { Increasing res->corruptions in this patch looks right because you don't actually rebuild anything yet. However, at the end of the series this code still looks the same - shouldn't we somehow convert those the corruptions caused by wrong refcounts into res->corruptions_fixed? Hm... It seems that you do this by storing intermediate results in qcow2_check_refcounts(), which assumes that compare_refcounts() finds only refcount problems, and no other place can find any. This may be true, but wouldn't it be more elegant to have a separate counter for corruptions that will be fixed with a rebuild? We can use some Qcow2CheckResult instead of BdrvCheckResult internally and add more fields there. Also, I don't think all "ERROR" messages correctly say "Repairing" again at the end of the series. > @@ -1624,40 +1511,10 @@ resize_fail: > inc_refcounts(bs, res, *refcount_table, *nb_clusters, > offset, s->cluster_size); > if ((*refcount_table)[cluster] !=3D 1) { > - fprintf(stderr, "%s refcount block %" PRId64 > - " refcount=3D%d\n", > - fix & BDRV_FIX_ERRORS ? "Repairing" : > - "ERROR", > - i, (*refcount_table)[cluster]); > - > - if (fix & BDRV_FIX_ERRORS) { > - int64_t new_offset; > - > - new_offset =3D realloc_refcount_block(bs, i, offse= t); > - if (new_offset < 0) { > - res->corruptions++; > - continue; > - } > - > - /* update refcounts */ > - if ((new_offset >> s->cluster_bits) >=3D *nb_clust= ers) { > - /* increase refcount_table size if necessary *= / > - int old_nb_clusters =3D *nb_clusters; > - *nb_clusters =3D (new_offset >> s->cluster_bit= s) + 1; > - *refcount_table =3D g_renew(uint16_t, *refcoun= t_table, > - *nb_clusters); > - memset(&(*refcount_table)[old_nb_clusters], 0, > - (*nb_clusters - old_nb_clusters) * > - sizeof(uint16_t)); > - } > - (*refcount_table)[cluster]--; > - inc_refcounts(bs, res, *refcount_table, *nb_cluste= rs, > - new_offset, s->cluster_size); > - > - res->corruptions_fixed++; > - } else { > - res->corruptions++; > - } > + fprintf(stderr, "ERROR refcount block %" PRId64 > + " refcount=3D%d\n", i, (*refcount_table)[clust= er]); > + res->corruptions++; > + *rebuild =3D true; > } > } > } > @@ -1669,8 +1526,8 @@ resize_fail: > * Calculates an in-memory refcount table. > */ > static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult *= res, > - BdrvCheckMode fix, uint16_t **refcount_= table, > - int64_t *nb_clusters) > + BdrvCheckMode fix, bool *rebuild, > + uint16_t **refcount_table, int64_t *nb_= clusters) > { > BDRVQcowState *s =3D bs->opaque; > int64_t i; > @@ -1713,7 +1570,7 @@ static int calculate_refcounts(BlockDriverState *= bs, BdrvCheckResult *res, > s->refcount_table_offset, > s->refcount_table_size * sizeof(uint64_t)); > =20 > - return check_refblocks(bs, res, fix, refcount_table, nb_clusters); > + return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_c= lusters); > } > =20 > /* > @@ -1721,7 +1578,8 @@ static int calculate_refcounts(BlockDriverState *= bs, BdrvCheckResult *res, > * refcount as reported by the refcount structures on-disk. > */ > static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult *r= es, > - BdrvCheckMode fix, int64_t *highest_clus= ter, > + BdrvCheckMode fix, bool *rebuild, > + int64_t *highest_cluster, > uint16_t *refcount_table, int64_t nb_clu= sters) > { > BDRVQcowState *s =3D bs->opaque; > @@ -1748,10 +1606,15 @@ static void compare_refcounts(BlockDriverState = *bs, BdrvCheckResult *res, > int *num_fixed =3D NULL; > if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) { > num_fixed =3D &res->leaks_fixed; > - } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERRORS= )) { > + } else if (refcount1 > 0 && refcount1 < refcount2 && > + (fix & BDRV_FIX_ERRORS)) { > num_fixed =3D &res->corruptions_fixed; > } > =20 > + if (refcount1 =3D=3D 0) { > + *rebuild =3D true; > + } Why isn't this just another else if branch? Possibly even as the first or second condition, so that you don't have to add 'refcount1 > 0' for the other branch. > + > fprintf(stderr, "%s cluster %" PRId64 " refcount=3D%d refe= rence=3D%d\n", > num_fixed !=3D NULL ? "Repairing" : > refcount1 < refcount2 ? "ERROR" : > @@ -1790,6 +1653,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, B= drvCheckResult *res, > BDRVQcowState *s =3D bs->opaque; > int64_t size, highest_cluster, nb_clusters; > uint16_t *refcount_table =3D NULL; > + bool rebuild =3D false; > int ret; > =20 > size =3D bdrv_getlength(bs->file); > @@ -1807,14 +1671,19 @@ int qcow2_check_refcounts(BlockDriverState *bs,= BdrvCheckResult *res, > res->bfi.total_clusters =3D > size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE); > =20 > - ret =3D calculate_refcounts(bs, res, fix, &refcount_table, &nb_clu= sters); > + ret =3D calculate_refcounts(bs, res, fix, &rebuild, &refcount_tabl= e, > + &nb_clusters); > if (ret < 0) { > goto fail; > } > =20 > - compare_refcounts(bs, res, fix, &highest_cluster, refcount_table, > + compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refcou= nt_table, > nb_clusters); > =20 > + if (rebuild) { > + fprintf(stderr, "ERROR need to rebuild refcount structures\n")= ; > + } Should this increate res->check_errors? (It doesn't survive until the end of the series anyway, but more places seem to be added later that print an error message without increasing the error count.) Kevin