From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55977) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XIGk5-0000Om-0n for qemu-devel@nongnu.org; Fri, 15 Aug 2014 08:36:31 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1XIGjx-0004F9-Uw for qemu-devel@nongnu.org; Fri, 15 Aug 2014 08:36:24 -0400 Received: from mx1.redhat.com ([209.132.183.28]:14028) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1XIGjx-0004F3-Mj for qemu-devel@nongnu.org; Fri, 15 Aug 2014 08:36:17 -0400 Message-ID: <53EDFEBA.10601@redhat.com> Date: Fri, 15 Aug 2014 14:36:10 +0200 From: Max Reitz MIME-Version: 1.0 References: <1407963710-4942-1-git-send-email-mreitz@redhat.com> <1407963710-4942-4-git-send-email-mreitz@redhat.com> <20140814121120.GH2009@irqsave.net> In-Reply-To: <20140814121120.GH2009@irqsave.net> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 3/8] qcow2: Fix refcount blocks beyond image end List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: =?windows-1252?Q?Beno=EEt_Canet?= Cc: Kevin Wolf , qemu-devel@nongnu.org, Stefan Hajnoczi On 14.08.2014 14:11, Beno=EEt Canet wrote: > The Wednesday 13 Aug 2014 =E0 23:01:45 (+0200), Max Reitz wrote : >> If the qcow2 check function detects a refcount block located beyond th= e >> image end, grow the image appropriately. This cannot break anything an= d >> is the logical fix for such a case. >> >> Signed-off-by: Max Reitz >> --- >> block/qcow2-refcount.c | 50 ++++++++++++++++++++++++++++++++++++++++= ++++++---- >> 1 file changed, 46 insertions(+), 4 deletions(-) >> >> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c >> index d1da8d5..a1d93e5 100644 >> --- a/block/qcow2-refcount.c >> +++ b/block/qcow2-refcount.c >> @@ -1504,7 +1504,8 @@ static int check_refblocks(BlockDriverState *bs,= BdrvCheckResult *res, >> int64_t *nb_clusters) >> { >> BDRVQcowState *s =3D bs->opaque; >> - int64_t i; >> + int64_t i, size; >> + int ret; >> =20 >> for (i =3D 0; i < s->refcount_table_size; i++) { >> uint64_t offset, cluster; >> @@ -1520,9 +1521,50 @@ static int check_refblocks(BlockDriverState *bs= , BdrvCheckResult *res, >> } >> =20 >> if (cluster >=3D *nb_clusters) { >> - fprintf(stderr, "ERROR refcount block %" PRId64 >> - " is outside image\n", i); >> - res->corruptions++; >> + fprintf(stderr, "%s refcount block %" PRId64 " is outside= image\n", >> + fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i)= ; >> + >> + if (fix & BDRV_FIX_ERRORS) { >> + int64_t old_nb_clusters =3D *nb_clusters; >> + >> + ret =3D bdrv_truncate(bs->file, offset + s->cluster_s= ize); >> + if (ret < 0) { >> + goto resize_fail; >> + } >> + size =3D bdrv_getlength(bs->file); >> + if (size < 0) { >> + ret =3D size; >> + goto resize_fail; >> + } >> + >> + *nb_clusters =3D size_to_clusters(s, size); >> + assert(*nb_clusters >=3D old_nb_clusters); >> + >> + *refcount_table =3D g_try_realloc(*refcount_table, >> + *nb_clusters * sizeof(uint16_t)); >> + if (!*refcount_table) { >> + res->check_errors++; >> + return -ENOMEM; > So you really want to make sure the code is not trying anything more > by directly returning -ENOMEM and not doing goto resize_fail. > > This makes sense though. > >> + } >> + >> + memset(*refcount_table + old_nb_clusters, 0, >> + (*nb_clusters - old_nb_clusters) * sizeof(uint= 16_t)); >> + >> + if (cluster >=3D *nb_clusters) { >> + ret =3D -EINVAL; >> + goto resize_fail; >> + } >> + >> + res->corruptions_fixed++; >> + continue; >> + >> +resize_fail: >> + res->corruptions++; >> + fprintf(stderr, "ERROR could not resize image: %s\n", >> + strerror(-ret)); > Isn't a "return ret;" missing here ? > the code will fall in the continue statement without it. And that it should. A corruption is reported to stderr, res->corruptions=20 is incremented and that's it - just as it was without this patch. The=20 only reason I see why we should completely abort here is because=20 resizing the file should always work; if it doesn't, something may be=20 completely wrong. But even that is no real reason to jump the shark; we=20 can still continue with the check and if everything is indeed completely=20 broken, we'll receive EIOs soon enough. Perhaps I should add a *rebuild =3D true; here and in the else branch in=20 the next patch, though. Max >> + } else { >> + res->corruptions++; >> + } >> continue; >> } >> =20 >> --=20 >> 2.0.3 >> >>