From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:55977)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XIGk5-0000Om-0n
	for qemu-devel@nongnu.org; Fri, 15 Aug 2014 08:36:31 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XIGjx-0004F9-Uw
	for qemu-devel@nongnu.org; Fri, 15 Aug 2014 08:36:24 -0400
Received: from mx1.redhat.com ([209.132.183.28]:14028)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1XIGjx-0004F3-Mj
	for qemu-devel@nongnu.org; Fri, 15 Aug 2014 08:36:17 -0400
Message-ID: <53EDFEBA.10601@redhat.com>
Date: Fri, 15 Aug 2014 14:36:10 +0200
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <1407963710-4942-1-git-send-email-mreitz@redhat.com>
	<1407963710-4942-4-git-send-email-mreitz@redhat.com>
	<20140814121120.GH2009@irqsave.net>
In-Reply-To: <20140814121120.GH2009@irqsave.net>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH 3/8] qcow2: Fix refcount blocks beyond
 image end
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: =?windows-1252?Q?Beno=EEt_Canet?= <benoit.canet@irqsave.net>
Cc: Kevin Wolf <kwolf@redhat.com>, qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>

On 14.08.2014 14:11, Beno=EEt Canet wrote:
> The Wednesday 13 Aug 2014 =E0 23:01:45 (+0200), Max Reitz wrote :
>> If the qcow2 check function detects a refcount block located beyond th=
e
>> image end, grow the image appropriately. This cannot break anything an=
d
>> is the logical fix for such a case.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2-refcount.c | 50 ++++++++++++++++++++++++++++++++++++++++=
++++++----
>>   1 file changed, 46 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c
>> index d1da8d5..a1d93e5 100644
>> --- a/block/qcow2-refcount.c
>> +++ b/block/qcow2-refcount.c
>> @@ -1504,7 +1504,8 @@ static int check_refblocks(BlockDriverState *bs,=
 BdrvCheckResult *res,
>>                              int64_t *nb_clusters)
>>   {
>>       BDRVQcowState *s =3D bs->opaque;
>> -    int64_t i;
>> +    int64_t i, size;
>> +    int ret;
>>  =20
>>       for (i =3D 0; i < s->refcount_table_size; i++) {
>>           uint64_t offset, cluster;
>> @@ -1520,9 +1521,50 @@ static int check_refblocks(BlockDriverState *bs=
, BdrvCheckResult *res,
>>           }
>>  =20
>>           if (cluster >=3D *nb_clusters) {
>> -            fprintf(stderr, "ERROR refcount block %" PRId64
>> -                    " is outside image\n", i);
>> -            res->corruptions++;
>> +            fprintf(stderr, "%s refcount block %" PRId64 " is outside=
 image\n",
>> +                    fix & BDRV_FIX_ERRORS ? "Repairing" : "ERROR", i)=
;
>> +
>> +            if (fix & BDRV_FIX_ERRORS) {
>> +                int64_t old_nb_clusters =3D *nb_clusters;
>> +
>> +                ret =3D bdrv_truncate(bs->file, offset + s->cluster_s=
ize);
>> +                if (ret < 0) {
>> +                    goto resize_fail;
>> +                }
>> +                size =3D bdrv_getlength(bs->file);
>> +                if (size < 0) {
>> +                    ret =3D size;
>> +                    goto resize_fail;
>> +                }
>> +
>> +                *nb_clusters =3D size_to_clusters(s, size);
>> +                assert(*nb_clusters >=3D old_nb_clusters);
>> +
>> +                *refcount_table =3D g_try_realloc(*refcount_table,
>> +                        *nb_clusters * sizeof(uint16_t));
>> +                if (!*refcount_table) {
>> +                    res->check_errors++;
>> +                    return -ENOMEM;
> So you really want to make sure the code is not trying anything more
> by directly returning -ENOMEM and not doing goto resize_fail.
>
> This makes sense though.
>
>> +                }
>> +
>> +                memset(*refcount_table + old_nb_clusters, 0,
>> +                       (*nb_clusters - old_nb_clusters) * sizeof(uint=
16_t));
>> +
>> +                if (cluster >=3D *nb_clusters) {
>> +                    ret =3D -EINVAL;
>> +                    goto resize_fail;
>> +                }
>> +
>> +                res->corruptions_fixed++;
>> +                continue;
>> +
>> +resize_fail:
>> +                res->corruptions++;
>> +                fprintf(stderr, "ERROR could not resize image: %s\n",
>> +                        strerror(-ret));
> Isn't a "return ret;" missing here ?
> the code will fall in the continue statement without it.

And that it should. A corruption is reported to stderr, res->corruptions=20
is incremented and that's it - just as it was without this patch. The=20
only reason I see why we should completely abort here is because=20
resizing the file should always work; if it doesn't, something may be=20
completely wrong. But even that is no real reason to jump the shark; we=20
can still continue with the check and if everything is indeed completely=20
broken, we'll receive EIOs soon enough.

Perhaps I should add a *rebuild =3D true; here and in the else branch in=20
the next patch, though.

Max

>> +            } else {
>> +                res->corruptions++;
>> +            }
>>               continue;
>>           }
>>  =20
>> --=20
>> 2.0.3
>>
>>