From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:38447)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1WNRGr-0008Mo-Rn
	for qemu-devel@nongnu.org; Tue, 11 Mar 2014 14:19:28 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1WNRGl-0000wE-Qt
	for qemu-devel@nongnu.org; Tue, 11 Mar 2014 14:19:21 -0400
Received: from mx1.redhat.com ([209.132.183.28]:32945)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <mreitz@redhat.com>) id 1WNRGl-0000vp-JG
	for qemu-devel@nongnu.org; Tue, 11 Mar 2014 14:19:15 -0400
Received: from int-mx09.intmail.prod.int.phx2.redhat.com
	(int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22])
	by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s2BIJDie009798
	(version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK)
	for <qemu-devel@nongnu.org>; Tue, 11 Mar 2014 14:19:14 -0400
Message-ID: <531F539C.2080800@redhat.com>
Date: Tue, 11 Mar 2014 19:19:08 +0100
From: Max Reitz <mreitz@redhat.com>
MIME-Version: 1.0
References: <1394491449-10897-1-git-send-email-mreitz@redhat.com>	<1394491449-10897-2-git-send-email-mreitz@redhat.com>	<531E47BF.8080709@redhat.com>
	<20140311101604.GB3215@dhcp-200-207.str.redhat.com>
	<531F17EF.9000500@redhat.com>
In-Reply-To: <531F17EF.9000500@redhat.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors()
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Paolo Bonzini <pbonzini@redhat.com>, Kevin Wolf <kwolf@redhat.com>, Laszlo Ersek <lersek@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>

On 11.03.2014 15:04, Paolo Bonzini wrote:
> Il 11/03/2014 11:16, Kevin Wolf ha scritto:
>> Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben:
>>> On 03/10/14 23:44, Max Reitz wrote:
>>>> Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
>>>> copy_sectors() should check whether that pointer is indeed valid, 
>>>> since
>>>> it may have been set to NULL by e.g. a concurrent write triggering the
>>>> corruption prevention mechanism.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>> To be precise, this still is a race condition. If bs->drv is set to 
>>>> NULL
>>>> after the check and before the call to bdrv_co_readv(), QEMU will
>>>> obviously still crash. However, in order to circumvent this 
>>>> behavior, we
>>>> would probably have to re-lock s->lock, check bs->drv, take the 
>>>> function
>>>> pointer to bdrv_co_readv() and then unlock s->lock before the function
>>>> is called. I found this rather ugly and therefore this still has a 
>>>> very
>>>> small chance of running into a race condition.
>>>> Therefore, I'm asking for your opinion on this, whether we can really
>>>> take this chance or should rather "do it right". In fact, if I were a
>>>> reviewer, I'd probably reject this patch and request the solution with
>>>> the function pointer (if there is no better solution), but I was 
>>>> afraid
>>>> to send such an ugly patch.
>>
>> No, the code is fine. Remember that qcow2 is not threaded, we're talking
>> about coroutines here. There is no way for the code to yield between
>> your check and the protected place.
>>
>>>>  block/qcow2-cluster.c | 4 ++++
>>>>  1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>>> index 36c1bed..9499df9 100644
>>>> --- a/block/qcow2-cluster.c
>>>> +++ b/block/qcow2-cluster.c
>>>> @@ -380,6 +380,10 @@ static int coroutine_fn 
>>>> copy_sectors(BlockDriverState *bs,
>>>>
>>>>      BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>>>>
>>>> +    if (!bs->drv) {
>>>> +        return -ENOMEDIUM;
>>>> +    }
>>>> +
>>>>      /* Call .bdrv_co_readv() directly instead of using the public 
>>>> block-layer
>>>>       * interface.  This avoids double I/O throttling and request 
>>>> tracking,
>>>>       * which can lead to deadlock when block layer copy-on-read is 
>>>> enabled.
>>>>
>>>
>>> I can't answer your question nor review this patch -- instead, I have a
>>> question of my own: when you say "set to NULL by [...] the corruption
>>> prevention mechanism", do you mean qcow2_pre_write_overlap_check():
>>>
>>>         bs->drv = NULL; /* make BDS unusable */
>>
>> Yes, this is the place.
>>
>>> If so: I thought that it was quite a bold move, but also that we'd find
>>> the SIGSEGVs sooner or later... :)
>>
>> In fact, if you use the block layer API, most functions check for
>> bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we
>> directly dereference the pointer without going through block.c (there's
>> a good reason for this, see the comment, but it still makes it somewhat
>> special).
>
> But why not call qcow2_co_readv directly?

qcow2_co_readv() would probably actually perform the operation, as far 
as I can see, since it does not check whether the medium is corrupt 
(this is only checked when opening the block device). This is the reason 
why bs->drv is set to NULL, so noone is able to read from the block 
device anymore, even without checking for corruption (additionally, on 
version 2 images, there is not even a way to indicate corruption other 
than setting bs->drv to NULL).
Normally, every read operation would pass through bdrv_co_readv(), which 
will (have to) check whether bs->drv is valid or NULL - since this code 
decides to skip that and call the driver function directly, we therefore 
have to check bs->drv anyway, since it is the only way of telling us 
whether the image may actually be accessed right here. We could change 
the bs->drv->bdrv_co_readv() call to qcow2_co_readv(), but then people 
will probably ask why there is a bs->drv check right before.

Max