* [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers
@ 2017-03-02 13:04 Halil Pasic
2017-03-02 14:49 ` Cornelia Huck
0 siblings, 1 reply; 8+ messages in thread
From: Halil Pasic @ 2017-03-02 13:04 UTC (permalink / raw)
To: qemu-devel, Michael S. Tsirkin
Cc: Paolo Bonzini, Stefan Hajnoczi, Cornelia Huck, Halil Pasic,
qemu-stable
The commits 03de2f527 "virtio-blk: do not use vring in dataplane" and
9ffe337c08 "virtio-blk: always use dataplane path if ioeventfd is active"
changed how notifications are done for virtio-blk substantially. Due to a
race condition, interrupts are lost when irqfd behind the guest notifier
is torn down after notify_guest_bh was scheduled but before it actually
runs.
Let's fix this by forcing guest notifications before cleaning up the
irqfd's. Let's also add some explanatory comments.
Cc: qemu-stable@nongnu.org
Signed-off-by: Halil Pasic <pasic@linux.vnet.ibm.com>
Reported-by: Michael A. Tebolt <miket@us.ibm.com>
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
---
This patch supersedes my previous attempt to fix the same issue which was
running under the title "virtio: fallback from irqfd to non-irqfd
notify".
Tested it briefly without iothread and over the night with iothread, but
more testing would not hurt.
@Paolo: Could you please check the comments? They reflect my
understanding which may be very wrong.
---
hw/block/dataplane/virtio-blk.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
index 5556f0e..13dd14d 100644
--- a/hw/block/dataplane/virtio-blk.c
+++ b/hw/block/dataplane/virtio-blk.c
@@ -258,9 +258,16 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
}
- /* Drain and switch bs back to the QEMU main loop */
+ /* Drain and switch bs back to the QEMU main loop. After drain, the
+ * device will not submit (nor comple) any requests until dataplane
+ * starts again.
+ */
blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
+ /* Notify guest before the guest notifiers get cleaned up */
+ qemu_bh_cancel(s->bh);
+ notify_guest_bh(s);
+
aio_context_release(s->ctx);
for (i = 0; i < nvqs; i++) {
--
2.8.4
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers
2017-03-02 13:04 [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers Halil Pasic
@ 2017-03-02 14:49 ` Cornelia Huck
2017-03-02 15:32 ` Paolo Bonzini
0 siblings, 1 reply; 8+ messages in thread
From: Cornelia Huck @ 2017-03-02 14:49 UTC (permalink / raw)
To: Halil Pasic
Cc: qemu-devel, Michael S. Tsirkin, Paolo Bonzini, Stefan Hajnoczi,
qemu-stable
On Thu, 2 Mar 2017 14:04:22 +0100
Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
> index 5556f0e..13dd14d 100644
> --- a/hw/block/dataplane/virtio-blk.c
> +++ b/hw/block/dataplane/virtio-blk.c
> @@ -258,9 +258,16 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
> virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
> }
>
> - /* Drain and switch bs back to the QEMU main loop */
> + /* Drain and switch bs back to the QEMU main loop. After drain, the
> + * device will not submit (nor comple) any requests until dataplane
s/comple/complete/
> + * starts again.
> + */
> blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
>
> + /* Notify guest before the guest notifiers get cleaned up */
> + qemu_bh_cancel(s->bh);
> + notify_guest_bh(s);
> +
Hm... does virtio-scsi dataplane need a similar treatment? Or am I
missing something?
> aio_context_release(s->ctx);
>
> for (i = 0; i < nvqs; i++) {
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers
2017-03-02 14:49 ` Cornelia Huck
@ 2017-03-02 15:32 ` Paolo Bonzini
2017-03-02 15:55 ` Halil Pasic
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2017-03-02 15:32 UTC (permalink / raw)
To: Cornelia Huck, Halil Pasic
Cc: qemu-devel, Michael S. Tsirkin, Stefan Hajnoczi, qemu-stable
On 02/03/2017 15:49, Cornelia Huck wrote:
> On Thu, 2 Mar 2017 14:04:22 +0100
> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>
>> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
>> index 5556f0e..13dd14d 100644
>> --- a/hw/block/dataplane/virtio-blk.c
>> +++ b/hw/block/dataplane/virtio-blk.c
>> @@ -258,9 +258,16 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
>> virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
>> }
>>
>> - /* Drain and switch bs back to the QEMU main loop */
>> + /* Drain and switch bs back to the QEMU main loop. After drain, the
>> + * device will not submit (nor comple) any requests until dataplane
>
> s/comple/complete/
>
>> + * starts again.
>> + */
>> blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
>>
>> + /* Notify guest before the guest notifiers get cleaned up */
>> + qemu_bh_cancel(s->bh);
>> + notify_guest_bh(s);
>> +
>
> Hm... does virtio-scsi dataplane need a similar treatment? Or am I
> missing something?
No, the BH optimization is specific to virtio-blk. Thanks for the patch
Halil!
Paolo
>> aio_context_release(s->ctx);
>>
>> for (i = 0; i < nvqs; i++) {
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers
2017-03-02 15:32 ` Paolo Bonzini
@ 2017-03-02 15:55 ` Halil Pasic
2017-03-02 16:21 ` Paolo Bonzini
0 siblings, 1 reply; 8+ messages in thread
From: Halil Pasic @ 2017-03-02 15:55 UTC (permalink / raw)
To: Paolo Bonzini, Cornelia Huck
Cc: qemu-devel, Michael S. Tsirkin, Stefan Hajnoczi, qemu-stable
On 03/02/2017 04:32 PM, Paolo Bonzini wrote:
>
>
> On 02/03/2017 15:49, Cornelia Huck wrote:
>> On Thu, 2 Mar 2017 14:04:22 +0100
>> Halil Pasic <pasic@linux.vnet.ibm.com> wrote:
>>
>>> diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c
>>> index 5556f0e..13dd14d 100644
>>> --- a/hw/block/dataplane/virtio-blk.c
>>> +++ b/hw/block/dataplane/virtio-blk.c
>>> @@ -258,9 +258,16 @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev)
>>> virtio_queue_aio_set_host_notifier_handler(vq, s->ctx, NULL);
>>> }
>>>
>>> - /* Drain and switch bs back to the QEMU main loop */
>>> + /* Drain and switch bs back to the QEMU main loop. After drain, the
>>> + * device will not submit (nor comple) any requests until dataplane
>>
>> s/comple/complete/
>>
Will fix.
>>> + * starts again.
>>> + */
>>> blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
I'm wondering if synchronization is needed for batch_notify_vqs. I think
the set_bit can be from the iothread, but the notify_guest_bh below is main
event loop. Is it OK like this (could we miss bits set in other thread)?
Does blk_set_aio_context include a barrier?
>>>
>>> + /* Notify guest before the guest notifiers get cleaned up */
>>> + qemu_bh_cancel(s->bh);
>>> + notify_guest_bh(s);
>>> +
>>
>> Hm... does virtio-scsi dataplane need a similar treatment? Or am I
>> missing something?
>
> No, the BH optimization is specific to virtio-blk. Thanks for the patch
> Halil!
>
You are welcome ;)!
Regards,
Halil
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers
2017-03-02 15:55 ` Halil Pasic
@ 2017-03-02 16:21 ` Paolo Bonzini
2017-03-03 19:43 ` Halil Pasic
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2017-03-02 16:21 UTC (permalink / raw)
To: Halil Pasic, Cornelia Huck
Cc: qemu-devel, Michael S. Tsirkin, Stefan Hajnoczi, qemu-stable
On 02/03/2017 16:55, Halil Pasic wrote:
>>>> blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
> I'm wondering if synchronization is needed for batch_notify_vqs. I think
> the set_bit can be from the iothread, but the notify_guest_bh below is main
> event loop. Is it OK like this (could we miss bits set in other thread)?
> Does blk_set_aio_context include a barrier?
Another good question---yes, it does (via bdrv_set_aio_context's call to
aio_context_acquire).
But it's something to remember for when I get round to removing
aio_context_acquire!
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers
2017-03-02 16:21 ` Paolo Bonzini
@ 2017-03-03 19:43 ` Halil Pasic
2017-03-06 14:55 ` Paolo Bonzini
0 siblings, 1 reply; 8+ messages in thread
From: Halil Pasic @ 2017-03-03 19:43 UTC (permalink / raw)
To: Paolo Bonzini, Cornelia Huck
Cc: qemu-stable, qemu-devel, Stefan Hajnoczi, Michael S. Tsirkin
On 03/02/2017 05:21 PM, Paolo Bonzini wrote:
>
>
> On 02/03/2017 16:55, Halil Pasic wrote:
>>>>> blk_set_aio_context(s->conf->conf.blk, qemu_get_aio_context());
>> I'm wondering if synchronization is needed for batch_notify_vqs. I think
>> the set_bit can be from the iothread, but the notify_guest_bh below is main
>> event loop. Is it OK like this (could we miss bits set in other thread)?
>> Does blk_set_aio_context include a barrier?
>
> Another good question---yes, it does (via bdrv_set_aio_context's call to
> aio_context_acquire).
>
> But it's something to remember for when I get round to removing
> aio_context_acquire!
Uh, this is complicated. I'm not out of questions, but I fear taking to
much of your precious time. I will ask again nevertheless, but please
just cut the conversation with -EBUSY if it gets to expensive.
What I understand is the batch_notify_vqs is effectively protected
by blk_get_aio_context(s->conf.conf.blk)->lock which might or might not
be the same as qemu_get_aio_context()->lock. If it ain't the same
that means we are running with iothread. Now let us have a look
at the drain. IFAIU in
#define BDRV_POLL_WHILE(bs, cond) ({ \
bool waited_ = false; \
BlockDriverState *bs_ = (bs); \
AioContext *ctx_ = bdrv_get_aio_context(bs_); \
if (aio_context_in_iothread(ctx_)) { \
while ((cond)) { \
aio_poll(ctx_, true); \
waited_ = true; \
} \
} else { \
assert(qemu_get_current_aio_context() == \
qemu_get_aio_context()); \
/* Ask bdrv_dec_in_flight to wake up the main \
* QEMU AioContext. Extra I/O threads never take \
* other I/O threads' AioContexts (see for example \
* block_job_defer_to_main_loop for how to do it). \
*/ \
assert(!bs_->wakeup); \
bs_->wakeup = true; \
while ((cond)) { \
aio_context_release(ctx_); \
aio_poll(qemu_get_aio_context(), true); \
aio_context_acquire(ctx_); \
waited_ = true; \
} \
bs_->wakeup = false; \
} \
waited_; })
where it's interesting (me running with 2 iothreads one assigned to my
block device) we are gonna take the "else" branch , a will end up
releasing the ctx belonging to the iothread and then acquire it again,
and basically wait till the requests are done.
Since is virtio_blk_rw_complete acquiring-releasing the same ctx this
makes a lot of sense. If we would not release the ctx we would end up
with a deadlock.
What I do not understand entirely are the atomic_read/write on
bs->in_flight which tells us when are we done waiting. In
static void blk_aio_complete(BlkAioEmAIOCB *acb)
{
if (acb->has_returned) {
bdrv_dec_in_flight(acb->common.bs);
acb->common.cb(acb->common.opaque, acb->rwco.ret);
qemu_aio_unref(acb);
}
}
the atomic decrements is before the callback, that is
virtio_blk_rw_complete called. My guess here is, that this does not
matter, because we are holding the ctx anyway, so it is ensured that we
will not observe 0 until the last virtio_blk_rw_complete completed (mutex
is recursive). But then I do not understand what is the point acquiring
the mutex in virtio_blk_rw_complete?
TL;DR
Is patch b9e413dd required for the correctness of this patch?
What is the role of the aio_context_acquire/release introduced by
b9e413dd in virtio_blk_rw_complete?
Regards,
Halil
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers
2017-03-03 19:43 ` Halil Pasic
@ 2017-03-06 14:55 ` Paolo Bonzini
2017-03-06 22:27 ` Halil Pasic
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2017-03-06 14:55 UTC (permalink / raw)
To: Halil Pasic, Cornelia Huck
Cc: qemu-stable, qemu-devel, Stefan Hajnoczi, Michael S. Tsirkin
On 03/03/2017 20:43, Halil Pasic wrote:
> Uh, this is complicated. I'm not out of questions, but I fear taking to
> much of your precious time. I will ask again nevertheless, but please
> just cut the conversation with -EBUSY if it gets to expensive.
It's the opposite! I need other people to look at it, understand it and
poke holes at my own reasoning.
> bs_->wakeup = true; \
> while ((cond)) { \
> aio_context_release(ctx_); \
> aio_poll(qemu_get_aio_context(), true); \
> aio_context_acquire(ctx_); \
> waited_ = true; \
> } \
> bs_->wakeup = false; \
> } \
> waited_; })
>
> where it's interesting (me running with 2 iothreads one assigned to my
> block device) we are gonna take the "else" branch , a will end up
> releasing the ctx belonging to the iothread and then acquire it again,
> and basically wait till the requests are done.
>
> Since is virtio_blk_rw_complete acquiring-releasing the same ctx this
> makes a lot of sense. If we would not release the ctx we would end up
> with a deadlock.
Related to this, note that the above release/wait/acquire logic is
itself new to 2.8. Before, QEMU would run aio_poll(other_aio_context)
directly in the main thread. This relied on recursive mutexes and
special callbacks to pass the lock between the I/O and main threads.
This worked but it hid some thread-unsafe idioms, so I removed this in
the commits ending at 65c1b5b ("iothread: release AioContext around
aio_poll", 2016-10-28). BDRV_POLL_WHILE provides an abstraction that
will not change once aio_context_acquire/release disappears in favor of
fine-grained locks.
> But then I do not understand what is the point acquiring
> the mutex in virtio_blk_rw_complete?
When QEMU does I/O in the main thread, it can call
BlockBackend/BlockDriverState functions. Even though their completion
is processed in the I/O thread (via BDRV_POLL_WHILE), you still need a
lock to handle mutual exclusion between the two.
In the case of virtio_blk_rw_complete the mutex is needed because, for
example, block_acct_done is not thread safe yet.
> Is patch b9e413dd required for the correctness of this patch?
> What is the role of the aio_context_acquire/release introduced by
> b9e413dd in virtio_blk_rw_complete?
Patch b9e413dd should have no semantic change. The acquire/release that
are added are all balanced by removed acquire/releases elsewhere, for
example in block/linux-aio.c.
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers
2017-03-06 14:55 ` Paolo Bonzini
@ 2017-03-06 22:27 ` Halil Pasic
0 siblings, 0 replies; 8+ messages in thread
From: Halil Pasic @ 2017-03-06 22:27 UTC (permalink / raw)
To: Paolo Bonzini, Cornelia Huck
Cc: Michael S. Tsirkin, qemu-stable, Stefan Hajnoczi, qemu-devel
On 03/06/2017 03:55 PM, Paolo Bonzini wrote:
>
> On 03/03/2017 20:43, Halil Pasic wrote:
>> Uh, this is complicated. I'm not out of questions, but I fear taking to
>> much of your precious time. I will ask again nevertheless, but please
>> just cut the conversation with -EBUSY if it gets to expensive.
> It's the opposite! I need other people to look at it, understand it and
> poke holes at my own reasoning.
>
Thanks for the explanation. I intend to send out a v2 (with
the typo fixed) tomorrow.
Regards,
Halil
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-03-06 22:28 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-03-02 13:04 [Qemu-devel] [PATCH 1/1] virtio-blk: fix race on guest notifiers Halil Pasic
2017-03-02 14:49 ` Cornelia Huck
2017-03-02 15:32 ` Paolo Bonzini
2017-03-02 15:55 ` Halil Pasic
2017-03-02 16:21 ` Paolo Bonzini
2017-03-03 19:43 ` Halil Pasic
2017-03-06 14:55 ` Paolo Bonzini
2017-03-06 22:27 ` Halil Pasic
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).