From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:44388) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XgWPI-0003Fs-Rb for qemu-devel@nongnu.org; Tue, 21 Oct 2014 06:11:17 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XgWP7-0000dv-DX for qemu-devel@nongnu.org; Tue, 21 Oct 2014 06:11:12 -0400 Received: from mx1.redhat.com ([209.132.183.28]:21482) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XgWP7-0000dc-3g for qemu-devel@nongnu.org; Tue, 21 Oct 2014 06:11:01 -0400 Message-ID: <5446312E.9030204@redhat.com> Date: Tue, 21 Oct 2014 12:10:54 +0200 From: Max Reitz MIME-Version: 1.0 References: <1413815733-22829-1-git-send-email-mreitz@redhat.com> <1413815733-22829-8-git-send-email-mreitz@redhat.com> <20141021075230.GB4409@noname.redhat.com> In-Reply-To: <20141021075230.GB4409@noname.redhat.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v6 07/11] qcow2: Do not perform potentially damaging repairs List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Kevin Wolf Cc: qemu-devel@nongnu.org, Stefan Hajnoczi , =?windows-1252?Q?Beno=EEt_Canet?= On 2014-10-21 at 09:52, Kevin Wolf wrote: > Am 20.10.2014 um 16:35 hat Max Reitz geschrieben: >> If a referenced cluster has a refcount of 0, increasing its refcount m= ay >> result in clusters being allocated for the refcount structures. This m= ay >> overwrite the referenced cluster, therefore we cannot simply increase >> the refcount then. >> >> In such cases, we can either try to replicate all the refcount >> operations solely for the check operation, basing the allocations on t= he >> in-memory refcount table; or we can simply rebuild the whole refcount >> structure based on the in-memory refcount table. Since the latter will >> be much easier, do that. >> >> To prepare for this, introduce a "rebuild" boolean which should be set >> to true whenever a fix is rather dangerous or too complicated using th= e >> current refcount structures. Another example for this is refcount bloc= ks >> being referenced more than once. >> >> Signed-off-by: Max Reitz >> Reviewed-by: Beno=EEt Canet >> @@ -1557,6 +1442,7 @@ static int check_refblocks(BlockDriverState *bs,= BdrvCheckResult *res, >> fprintf(stderr, "ERROR refcount block %" PRId64 " is not= " >> "cluster aligned; refcount table entry corrupted\n",= i); >> res->corruptions++; >> + *rebuild =3D true; >> continue; >> } >> =20 >> @@ -1612,6 +1498,7 @@ static int check_refblocks(BlockDriverState *bs,= BdrvCheckResult *res, >> =20 >> resize_fail: >> res->corruptions++; >> + *rebuild =3D true; >> fprintf(stderr, "ERROR could not resize image: %s\n"= , >> strerror(-ret)); >> } else { > Increasing res->corruptions in this patch looks right because you don't > actually rebuild anything yet. However, at the end of the series this > code still looks the same - shouldn't we somehow convert those the > corruptions caused by wrong refcounts into res->corruptions_fixed? > > Hm... It seems that you do this by storing intermediate results in > qcow2_check_refcounts(), which assumes that compare_refcounts() finds > only refcount problems, and no other place can find any. This may be > true, but wouldn't it be more elegant to have a separate counter for > corruptions that will be fixed with a rebuild? We can use some > Qcow2CheckResult instead of BdrvCheckResult internally and add more > fields there. > > Also, I don't think all "ERROR" messages correctly say "Repairing" agai= n > at the end of the series. I'll have a look on this, but this hunk for instance should not say=20 "Repairing". It's an error and it is not directly repaired. I think the=20 output "Rebuilding refcount structure" should be sufficient. >> @@ -1624,40 +1511,10 @@ resize_fail: >> inc_refcounts(bs, res, *refcount_table, *nb_clusters, >> offset, s->cluster_size); >> if ((*refcount_table)[cluster] !=3D 1) { >> - fprintf(stderr, "%s refcount block %" PRId64 >> - " refcount=3D%d\n", >> - fix & BDRV_FIX_ERRORS ? "Repairing" : >> - "ERROR", >> - i, (*refcount_table)[cluster]); >> - >> - if (fix & BDRV_FIX_ERRORS) { >> - int64_t new_offset; >> - >> - new_offset =3D realloc_refcount_block(bs, i, offs= et); >> - if (new_offset < 0) { >> - res->corruptions++; >> - continue; >> - } >> - >> - /* update refcounts */ >> - if ((new_offset >> s->cluster_bits) >=3D *nb_clus= ters) { >> - /* increase refcount_table size if necessary = */ >> - int old_nb_clusters =3D *nb_clusters; >> - *nb_clusters =3D (new_offset >> s->cluster_bi= ts) + 1; >> - *refcount_table =3D g_renew(uint16_t, *refcou= nt_table, >> - *nb_clusters); >> - memset(&(*refcount_table)[old_nb_clusters], 0= , >> - (*nb_clusters - old_nb_clusters) * >> - sizeof(uint16_t)); >> - } >> - (*refcount_table)[cluster]--; >> - inc_refcounts(bs, res, *refcount_table, *nb_clust= ers, >> - new_offset, s->cluster_size); >> - >> - res->corruptions_fixed++; >> - } else { >> - res->corruptions++; >> - } >> + fprintf(stderr, "ERROR refcount block %" PRId64 >> + " refcount=3D%d\n", i, (*refcount_table)[clus= ter]); >> + res->corruptions++; >> + *rebuild =3D true; >> } >> } >> } >> @@ -1669,8 +1526,8 @@ resize_fail: >> * Calculates an in-memory refcount table. >> */ >> static int calculate_refcounts(BlockDriverState *bs, BdrvCheckResult= *res, >> - BdrvCheckMode fix, uint16_t **refcount= _table, >> - int64_t *nb_clusters) >> + BdrvCheckMode fix, bool *rebuild, >> + uint16_t **refcount_table, int64_t *nb= _clusters) >> { >> BDRVQcowState *s =3D bs->opaque; >> int64_t i; >> @@ -1713,7 +1570,7 @@ static int calculate_refcounts(BlockDriverState = *bs, BdrvCheckResult *res, >> s->refcount_table_offset, >> s->refcount_table_size * sizeof(uint64_t)); >> =20 >> - return check_refblocks(bs, res, fix, refcount_table, nb_clusters)= ; >> + return check_refblocks(bs, res, fix, rebuild, refcount_table, nb_= clusters); >> } >> =20 >> /* >> @@ -1721,7 +1578,8 @@ static int calculate_refcounts(BlockDriverState = *bs, BdrvCheckResult *res, >> * refcount as reported by the refcount structures on-disk. >> */ >> static void compare_refcounts(BlockDriverState *bs, BdrvCheckResult = *res, >> - BdrvCheckMode fix, int64_t *highest_clu= ster, >> + BdrvCheckMode fix, bool *rebuild, >> + int64_t *highest_cluster, >> uint16_t *refcount_table, int64_t nb_c= lusters) >> { >> BDRVQcowState *s =3D bs->opaque; >> @@ -1748,10 +1606,15 @@ static void compare_refcounts(BlockDriverState= *bs, BdrvCheckResult *res, >> int *num_fixed =3D NULL; >> if (refcount1 > refcount2 && (fix & BDRV_FIX_LEAKS)) { >> num_fixed =3D &res->leaks_fixed; >> - } else if (refcount1 < refcount2 && (fix & BDRV_FIX_ERROR= S)) { >> + } else if (refcount1 > 0 && refcount1 < refcount2 && >> + (fix & BDRV_FIX_ERRORS)) { >> num_fixed =3D &res->corruptions_fixed; >> } >> =20 >> + if (refcount1 =3D=3D 0) { >> + *rebuild =3D true; >> + } > Why isn't this just another else if branch? Possibly even as the first > or second condition, so that you don't have to add 'refcount1 > 0' for > the other branch. Because it's semantically different. The if-else sets the num_fixed=20 pointer, whereas this conditional block forces a rebuild. I can include=20 it as the second condition, just at the time I liked it more to have it=20 externally. Now that I'm reading once again over it... Probably using it as the=20 second condition is better. I'll see to it. >> + >> fprintf(stderr, "%s cluster %" PRId64 " refcount=3D%d re= ference=3D%d\n", >> num_fixed !=3D NULL ? "Repairing" : >> refcount1 < refcount2 ? "ERROR" : >> @@ -1790,6 +1653,7 @@ int qcow2_check_refcounts(BlockDriverState *bs, = BdrvCheckResult *res, >> BDRVQcowState *s =3D bs->opaque; >> int64_t size, highest_cluster, nb_clusters; >> uint16_t *refcount_table =3D NULL; >> + bool rebuild =3D false; >> int ret; >> =20 >> size =3D bdrv_getlength(bs->file); >> @@ -1807,14 +1671,19 @@ int qcow2_check_refcounts(BlockDriverState *bs= , BdrvCheckResult *res, >> res->bfi.total_clusters =3D >> size_to_clusters(s, bs->total_sectors * BDRV_SECTOR_SIZE); >> =20 >> - ret =3D calculate_refcounts(bs, res, fix, &refcount_table, &nb_cl= usters); >> + ret =3D calculate_refcounts(bs, res, fix, &rebuild, &refcount_tab= le, >> + &nb_clusters); >> if (ret < 0) { >> goto fail; >> } >> =20 >> - compare_refcounts(bs, res, fix, &highest_cluster, refcount_table, >> + compare_refcounts(bs, res, fix, &rebuild, &highest_cluster, refco= unt_table, >> nb_clusters); >> =20 >> + if (rebuild) { >> + fprintf(stderr, "ERROR need to rebuild refcount structures\n"= ); >> + } > Should this increate res->check_errors? (It doesn't survive until the > end of the series anyway, but more places seem to be added later that > print an error message without increasing the error count.) Yes, it should. We could even bail out, but it doesn't make much sense=20 here (it makes more sense in patch 8). Thanks, Max