From: Max Reitz <mreitz@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>,
Kevin Wolf <kwolf@redhat.com>, Laszlo Ersek <lersek@redhat.com>
Cc: qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors()
Date: Tue, 11 Mar 2014 19:19:08 +0100 [thread overview]
Message-ID: <531F539C.2080800@redhat.com> (raw)
In-Reply-To: <531F17EF.9000500@redhat.com>
On 11.03.2014 15:04, Paolo Bonzini wrote:
> Il 11/03/2014 11:16, Kevin Wolf ha scritto:
>> Am 11.03.2014 um 00:16 hat Laszlo Ersek geschrieben:
>>> On 03/10/14 23:44, Max Reitz wrote:
>>>> Before dereferencing bs->drv for a call to its member bdrv_co_readv(),
>>>> copy_sectors() should check whether that pointer is indeed valid,
>>>> since
>>>> it may have been set to NULL by e.g. a concurrent write triggering the
>>>> corruption prevention mechanism.
>>>>
>>>> Signed-off-by: Max Reitz <mreitz@redhat.com>
>>>> ---
>>>> To be precise, this still is a race condition. If bs->drv is set to
>>>> NULL
>>>> after the check and before the call to bdrv_co_readv(), QEMU will
>>>> obviously still crash. However, in order to circumvent this
>>>> behavior, we
>>>> would probably have to re-lock s->lock, check bs->drv, take the
>>>> function
>>>> pointer to bdrv_co_readv() and then unlock s->lock before the function
>>>> is called. I found this rather ugly and therefore this still has a
>>>> very
>>>> small chance of running into a race condition.
>>>> Therefore, I'm asking for your opinion on this, whether we can really
>>>> take this chance or should rather "do it right". In fact, if I were a
>>>> reviewer, I'd probably reject this patch and request the solution with
>>>> the function pointer (if there is no better solution), but I was
>>>> afraid
>>>> to send such an ugly patch.
>>
>> No, the code is fine. Remember that qcow2 is not threaded, we're talking
>> about coroutines here. There is no way for the code to yield between
>> your check and the protected place.
>>
>>>> block/qcow2-cluster.c | 4 ++++
>>>> 1 file changed, 4 insertions(+)
>>>>
>>>> diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
>>>> index 36c1bed..9499df9 100644
>>>> --- a/block/qcow2-cluster.c
>>>> +++ b/block/qcow2-cluster.c
>>>> @@ -380,6 +380,10 @@ static int coroutine_fn
>>>> copy_sectors(BlockDriverState *bs,
>>>>
>>>> BLKDBG_EVENT(bs->file, BLKDBG_COW_READ);
>>>>
>>>> + if (!bs->drv) {
>>>> + return -ENOMEDIUM;
>>>> + }
>>>> +
>>>> /* Call .bdrv_co_readv() directly instead of using the public
>>>> block-layer
>>>> * interface. This avoids double I/O throttling and request
>>>> tracking,
>>>> * which can lead to deadlock when block layer copy-on-read is
>>>> enabled.
>>>>
>>>
>>> I can't answer your question nor review this patch -- instead, I have a
>>> question of my own: when you say "set to NULL by [...] the corruption
>>> prevention mechanism", do you mean qcow2_pre_write_overlap_check():
>>>
>>> bs->drv = NULL; /* make BDS unusable */
>>
>> Yes, this is the place.
>>
>>> If so: I thought that it was quite a bold move, but also that we'd find
>>> the SIGSEGVs sooner or later... :)
>>
>> In fact, if you use the block layer API, most functions check for
>> bs->drv and return -ENOMEDIUM if it is NULL. The problem here is that we
>> directly dereference the pointer without going through block.c (there's
>> a good reason for this, see the comment, but it still makes it somewhat
>> special).
>
> But why not call qcow2_co_readv directly?
qcow2_co_readv() would probably actually perform the operation, as far
as I can see, since it does not check whether the medium is corrupt
(this is only checked when opening the block device). This is the reason
why bs->drv is set to NULL, so noone is able to read from the block
device anymore, even without checking for corruption (additionally, on
version 2 images, there is not even a way to indicate corruption other
than setting bs->drv to NULL).
Normally, every read operation would pass through bdrv_co_readv(), which
will (have to) check whether bs->drv is valid or NULL - since this code
decides to skip that and call the driver function directly, we therefore
have to check bs->drv anyway, since it is the only way of telling us
whether the image may actually be accessed right here. We could change
the bs->drv->bdrv_co_readv() call to qcow2_co_readv(), but then people
will probably ask why there is a bs->drv check right before.
Max
next prev parent reply other threads:[~2014-03-11 18:19 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-03-10 22:44 [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Max Reitz
2014-03-10 22:44 ` [Qemu-devel] [PATCH 1/3] qcow2: Check bs->drv in copy_sectors() Max Reitz
2014-03-10 23:16 ` Laszlo Ersek
2014-03-11 10:16 ` Kevin Wolf
2014-03-11 14:04 ` Paolo Bonzini
2014-03-11 18:19 ` Max Reitz [this message]
2014-03-11 18:09 ` Max Reitz
2014-03-10 22:44 ` [Qemu-devel] [PATCH 2/3] block: bs->drv may be NULL in bdrv_debug_resume() Max Reitz
2014-03-10 22:44 ` [Qemu-devel] [PATCH 3/3] iotests: Test corruption during COW request Max Reitz
2014-03-11 10:21 ` [Qemu-devel] [PATCH 0/3] qcow2: Fix crash during copy_sectors() after corruption Kevin Wolf
2014-03-11 13:13 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=531F539C.2080800@redhat.com \
--to=mreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=lersek@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).