From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:36868)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1VUFlM-0007cT-Ny
	for qemu-devel@nongnu.org; Thu, 10 Oct 2013 08:54:50 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1VUFlG-0007na-N7
	for qemu-devel@nongnu.org; Thu, 10 Oct 2013 08:54:44 -0400
Received: from mx1.redhat.com ([209.132.183.28]:11019)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1VUFlG-0007nN-Dm
	for qemu-devel@nongnu.org; Thu, 10 Oct 2013 08:54:38 -0400
Received: from int-mx01.intmail.prod.int.phx2.redhat.com
	(int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id r9ACsa97026682
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <qemu-devel@nongnu.org>; Thu, 10 Oct 2013 08:54:37 -0400
Message-ID: <5256A38A.70407@redhat.com>
Date: Thu, 10 Oct 2013 14:54:34 +0200
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <1381395144-4449-1-git-send-email-mreitz@redhat.com>
	<1381395144-4449-2-git-send-email-mreitz@redhat.com>
	<20131010122625.GG3046@dhcp-200-207.str.redhat.com>
In-Reply-To: <20131010122625.GG3046@dhcp-200-207.str.redhat.com>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: quoted-printable
Subject: Re: [Qemu-devel] [PATCH 1/2] qcow2: Undo leaked allocations in
	co_writev
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>

On 2013-10-10 14:26, Kevin Wolf wrote:
> Am 10.10.2013 um 10:52 hat Max Reitz geschrieben:
>> If the write request spans more than one L2 table,
>> qcow2_alloc_cluster_offset cannot handle the required allocations
>> atomically. This results in leaks if it allocated new clusters in any
>> but the last L2 table touched and an error occurs in qcow2_co_writev
>> before having established the L2 link. These non-atomic allocations
>> were, however, indeed successful and are therefore given to the caller
>> in the L2Meta list.
>>
>> If an error occurs in qcow2_co_writev and the L2Meta list is unwound,
>> all its remaining entries are clusters whose L2 links were not yet
>> established. Thus, all allocations in that list should be undone.
>>
>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>> ---
>>   block/qcow2.c | 7 +++++++
>>   1 file changed, 7 insertions(+)
>>
>> diff --git a/block/qcow2.c b/block/qcow2.c
>> index b2489fb..6bedd5d 100644
>> --- a/block/qcow2.c
>> +++ b/block/qcow2.c
>> @@ -1017,6 +1017,13 @@ fail:
>>       while (l2meta !=3D NULL) {
>>           QCowL2Meta *next;
>>  =20
>> +        /* Undo all leaked allocations */
>> +        if (l2meta->nb_clusters !=3D 0) {
>> +            qcow2_free_clusters(bs, l2meta->alloc_offset,
>> +                                l2meta->nb_clusters << s->cluster_bit=
s,
>> +                                QCOW2_DISCARD_ALWAYS);
>> +        }
>> +
>>           if (l2meta->nb_clusters !=3D 0) {
>>               QLIST_REMOVE(l2meta, next_in_flight);
>>           }
> This feels a bit risky.
>
> I think currently it does work, because qcow2_alloc_cluster_link_l2()
> can only return an error when it didn't update the L2 entry in the cach=
e
> yet, but adding any error condition between that point and the L2Meta
> unwinding would result in corruption. I'm unsure, but perhaps a cluster
> leak is the lesser evil. Did you consider this? Do other people have an
> opinion on it?

What error conditions are there which can occur between=20
qcow2_alloc_cluster_link_l2 and the L2Meta unwinding? If all=20
qcow2_alloc_cluster_link_l2 calls are successful, the list is empty and=20
the while loop either goes into another iteration or the function=20
returns successfully (without any further need to unwind the list). If=20
some call fails, all previous (successful) calls have already been=20
removed from the list, therefore the unwinding only affects L2Meta=20
request with failed calls to qcow2_alloc_cluster_link_l2 (or ones where=20
that function wasn't called at all).

If the "currently" implied that this will turn out bad if there is a new=20
error condition between a successful call to qcow2_alloc_cluster_link_l2=20
and the removal of the L2Meta request from the list: Yes, that's true,=20
of course. However, as you've said, currently, there is no such=20
condition; and I don't see why it should be introduced. The sole purpose=20
of the list seems to be (to me) to execute qcow2_alloc_cluster_link_l2=20
on every of its elements. Thus, as soon as qcow2_alloc_cluster_link_l2=20
is successful, the corresponding request should be removed from the list.

So, in case you do agree that it currently works fine, I would not=20
consider it risky; if this patch is applied and some time in the future=20
anything introduces a "goto fail" between qcow2_alloc_cluster_link_l2=20
and l2_meta =3D next, this patch would simply have to make sure that=20
qcow2_free_clusters isn't called in this case. In the probably very=20
unlikely case all my previous assumptions and conclusions were true, I'd=20
just add a comment in the qcow2_alloc_cluster_link_l2 loop informing=20
about this case (=93If you add a goto fail here, make sure to pay=20
attention=94 or something along these lines).

> Also, shouldn't it be QCOW2_DISCARD_OTHER?

I'm always unsure about the discard flags. ;-)

I try to follow the rule of =93use the specific type (or =91other=92) for=
=20
freeing =91out of the blue=92, but use =91always=92 if it's just a very r=
ecent=20
allocation that is being undone again=94. I'd gladly accept better=20
recommendations. ;-)

Max