From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38447) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WNRGr-0008Mo-Rn for qemu-devel@nongnu.org; Tue, 11 Mar 2014 14:19:28 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WNRGl-0000wE-Qt for qemu-devel@nongnu.org; Tue, 11 Mar 2014 14:19:21 -0400 Received: from mx1.redhat.com ([209.132.183.28]:32945) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WNRGl-0000vp-JG for qemu-devel@nongnu.org; Tue, 11 Mar 2014 14:19:15 -0400 Received: from int-mx09.intmail.prod.int.phx2.redhat.com (int-mx09.intmail.prod.int.phx2.redhat.com [10.5.11.22]) by mx1.redhat.com (8.14.4/8.14.4) with ESMTP id s2BIJDie009798 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Tue, 11 Mar 2014 14:19:14 -0400 Message-ID: <531F539C.2080800@redhat.com> Date: Tue, 11 Mar 2014 19:19:08 +0100 From: Max Reitz MIME-Version: 1.0 References: <1394491449-10897-1-git-send-email-mreitz@redhat.com> <1394491449-10897-2-git-send-email-mreitz@redhat.com> <531E47BF.8080709@redhat.com> <20140311101604.GB3215@dhcp-200-207.str.redhat.com> <531F17EF.9000500@redhat.com> In-Reply-To: <531F17EF.9000500@redhat.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Paolo Bonzini , Kevin Wolf , Laszlo Ersek Cc: qemu-devel@nongnu.org, Stefan Hajnoczi On 11.03.2014 15:04, Paolo Bonzini wrote: > Il 11/03/2014 11:16, Kevin Wolf ha scritto: >> Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben: >>> On 03/10/14 23:44, Max Reitz wrote: >>>> Before dereferencing bs->drv for a call to its member bdrv_co_readv(), >>>> copy_sectors() should check whether that pointer is indeed valid, >>>> since >>>> it may have been set to NULL by e.g. a concurrent write triggering the >>>> corruption prevention mechanism. >>>> >>>> Signed-off-by: Max Reitz >>>> --- >>>> To be precise, this still is a race condition. If bs->drv is set to >>>> NULL >>>> after the check and before the call to bdrv_co_readv(), QEMU will >>>> obviously still crash. However, in order to circumvent this >>>> behavior, we >>>> would probably have to re-lock s->lock, check bs->drv, take the >>>> function >>>> pointer to bdrv_co_readv() and then unlock s->lock before the function >>>> is called. I found this rather ugly and therefore this still has a >>>> very >>>> small chance of running into a race condition. >>>> Therefore, I'm asking for your opinion on this, whether we can really >>>> take this chance or should rather "do it right". In fact, if I were a >>>> reviewer, I'd probably reject this patch and request the solution with >>>> the function pointer (if there is no better solution), but I was >>>> afraid >>>> to send such an ugly patch. >> >> No, the code is fine. Remember that qcow2 is not threaded, we're talking >> about coroutines here. There is no way for the code to yield between >> your check and the protected place. >> >>>> block/qcow2-cluster.c | 4 ++++ >>>> 1 file changed, 4 insertions(+) >>>> >>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c >>>> index 36c1bed..9499df9 100644 >>>> --- a/block/qcow2-cluster.c >>>> +++ b/block/qcow2-cluster.c >>>> @@ -380,6 +380,10 @@ static int coroutine_fn >>>> copy_sectors(BlockDriverState *bs, >>>> >>>> BLKDBG_EVENT(bs->file, BLKDBG_COW_READ); >>>> >>>> + if (!bs->drv) { >>>> + return -ENOMEDIUM; >>>> + } >>>> + >>>> /* Call .bdrv_co_readv() directly instead of using the public >>>> block-layer >>>> * interface. This avoids double I/O throttling and request >>>> tracking, >>>> * which can lead to deadlock when block layer copy-on-read is >>>> enabled. >>>> >>> >>> I can't answer your question nor review this patch -- instead, I have a >>> question of my own: when you say "set to NULL by [...] the corruption >>> prevention mechanism", do you mean qcow2_pre_write_overlap_check(): >>> >>> bs->drv = NULL; /* make BDS unusable */ >> >> Yes, this is the place. >> >>> If so: I thought that it was quite a bold move, but also that we'd find >>> the SIGSEGVs sooner or later... :) >> >> In fact, if you use the block layer API, most functions check for >> bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we >> directly dereference the pointer without going through block.c (there's >> a good reason for this, see the comment, but it still makes it somewhat >> special). > > But why not call qcow2_co_readv directly? qcow2_co_readv() would probably actually perform the operation, as far as I can see, since it does not check whether the medium is corrupt (this is only checked when opening the block device). This is the reason why bs->drv is set to NULL, so noone is able to read from the block device anymore, even without checking for corruption (additionally, on version 2 images, there is not even a way to indicate corruption other than setting bs->drv to NULL). Normally, every read operation would pass through bdrv_co_readv(), which will (have to) check whether bs->drv is valid or NULL - since this code decides to skip that and call the driver function directly, we therefore have to check bs->drv anyway, since it is the only way of telling us whether the image may actually be accessed right here. We could change the bs->drv->bdrv_co_readv() call to qcow2_co_readv(), but then people will probably ask why there is a bs->drv check right before. Max