* [Qemu-devel] [PATCH] vl: pause vcpus before stopping iothreads
@ 2018-01-30 15:38 Stefan Hajnoczi
2018-01-30 16:54 ` Kevin Wolf
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2018-01-30 15:38 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-block, Kevin Wolf, Fam Zheng, Stefan Hajnoczi
Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
before main() quits") introduced iothread_stop_all() to avoid the
following virtio-scsi assertion failure:
assert(blk_get_aio_context(d->conf.blk) == s->ctx);
Back then the assertion failed because when bdrv_close_all() made
d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
instead of s->ctx.
The same assertion can still fail today when vcpus submit new I/O
requests after iothread_stop_all() has moved the BDS to the global
AioContext.
This patch hardens the iothread_stop_all() approach by pausing vcpus
before calling iothread_stop_all().
Note that the assertion failure is a race condition. It is not possible
to reproduce it reliably.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
vl.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/vl.c b/vl.c
index e517a8d995..011f22ae20 100644
--- a/vl.c
+++ b/vl.c
@@ -4765,10 +4765,10 @@ int main(int argc, char **argv, char **envp)
os_setup_post();
main_loop();
+
replay_disable_events();
+ pause_all_vcpus();
iothread_stop_all();
-
- pause_all_vcpus();
bdrv_close_all();
res_free();
--
2.14.3
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH] vl: pause vcpus before stopping iothreads
2018-01-30 15:38 [Qemu-devel] [PATCH] vl: pause vcpus before stopping iothreads Stefan Hajnoczi
@ 2018-01-30 16:54 ` Kevin Wolf
2018-01-31 13:56 ` Stefan Hajnoczi
0 siblings, 1 reply; 5+ messages in thread
From: Kevin Wolf @ 2018-01-30 16:54 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, qemu-block, Fam Zheng
Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben:
> Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
> before main() quits") introduced iothread_stop_all() to avoid the
> following virtio-scsi assertion failure:
>
> assert(blk_get_aio_context(d->conf.blk) == s->ctx);
>
> Back then the assertion failed because when bdrv_close_all() made
> d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
> instead of s->ctx.
>
> The same assertion can still fail today when vcpus submit new I/O
> requests after iothread_stop_all() has moved the BDS to the global
> AioContext.
>
> This patch hardens the iothread_stop_all() approach by pausing vcpus
> before calling iothread_stop_all().
>
> Note that the assertion failure is a race condition. It is not possible
> to reproduce it reliably.
>
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Does pausing the vcpus actually make sure that the iothread isn't active
any more, or do we still have a small window where the vcpu is already
stopped, but the iothread is still processing requests?
Essentially, I think the bdrv_set_aio_context() in iothread_stop_all()
does either not have any effect, or if it does have an effect, it's
wrong. You can't just force an in-use BDS into a different AioContext
when the user that set the AioContext is still there.
At the very least, do we need a blk_drain_all() before stopping the
iothreads? It would still just be a hack, the proper way seens to be
getting the virtio device out of dataplane mode so that the iothread is
actually unused and doesn't just happen to not process something at the
moment.
Kevin
> vl.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/vl.c b/vl.c
> index e517a8d995..011f22ae20 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -4765,10 +4765,10 @@ int main(int argc, char **argv, char **envp)
> os_setup_post();
>
> main_loop();
> +
> replay_disable_events();
> + pause_all_vcpus();
> iothread_stop_all();
> -
> - pause_all_vcpus();
> bdrv_close_all();
> res_free();
>
> --
> 2.14.3
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH] vl: pause vcpus before stopping iothreads
2018-01-30 16:54 ` Kevin Wolf
@ 2018-01-31 13:56 ` Stefan Hajnoczi
2018-01-31 14:31 ` Kevin Wolf
0 siblings, 1 reply; 5+ messages in thread
From: Stefan Hajnoczi @ 2018-01-31 13:56 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Fam Zheng
[-- Attachment #1: Type: text/plain, Size: 2314 bytes --]
On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote:
> Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben:
> > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
> > before main() quits") introduced iothread_stop_all() to avoid the
> > following virtio-scsi assertion failure:
> >
> > assert(blk_get_aio_context(d->conf.blk) == s->ctx);
> >
> > Back then the assertion failed because when bdrv_close_all() made
> > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
> > instead of s->ctx.
> >
> > The same assertion can still fail today when vcpus submit new I/O
> > requests after iothread_stop_all() has moved the BDS to the global
> > AioContext.
> >
> > This patch hardens the iothread_stop_all() approach by pausing vcpus
> > before calling iothread_stop_all().
> >
> > Note that the assertion failure is a race condition. It is not possible
> > to reproduce it reliably.
> >
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
>
> Does pausing the vcpus actually make sure that the iothread isn't active
> any more, or do we still have a small window where the vcpu is already
> stopped, but the iothread is still processing requests?
>
> Essentially, I think the bdrv_set_aio_context() in iothread_stop_all()
> does either not have any effect, or if it does have an effect, it's
> wrong. You can't just force an in-use BDS into a different AioContext
> when the user that set the AioContext is still there.
>
> At the very least, do we need a blk_drain_all() before stopping the
> iothreads?
bdrv_set_aio_context() contains aio_disable_external() +
bdrv_parent_drained_begin() + bdrv_drain(bs). This should complete all
requests, even those sitting in a descriptor ring that hasn't been
processed yet.
> It would still just be a hack, the proper way seens to be
> getting the virtio device out of dataplane mode so that the iothread is
> actually unused and doesn't just happen to not process something at the
> moment.
Agreed, the existing approach is a hack. I'm not keen on implementing
a proper device<->IOThread detach operation because vl.c:main() seems to
be the only place that needs it - and it can get away with just
quiescing requests and the IOThread instead.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH] vl: pause vcpus before stopping iothreads
2018-01-31 13:56 ` Stefan Hajnoczi
@ 2018-01-31 14:31 ` Kevin Wolf
2018-02-01 11:07 ` Stefan Hajnoczi
0 siblings, 1 reply; 5+ messages in thread
From: Kevin Wolf @ 2018-01-31 14:31 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: qemu-devel, qemu-block, Fam Zheng
[-- Attachment #1: Type: text/plain, Size: 2772 bytes --]
Am 31.01.2018 um 14:56 hat Stefan Hajnoczi geschrieben:
> On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote:
> > Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben:
> > > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
> > > before main() quits") introduced iothread_stop_all() to avoid the
> > > following virtio-scsi assertion failure:
> > >
> > > assert(blk_get_aio_context(d->conf.blk) == s->ctx);
> > >
> > > Back then the assertion failed because when bdrv_close_all() made
> > > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
> > > instead of s->ctx.
> > >
> > > The same assertion can still fail today when vcpus submit new I/O
> > > requests after iothread_stop_all() has moved the BDS to the global
> > > AioContext.
> > >
> > > This patch hardens the iothread_stop_all() approach by pausing vcpus
> > > before calling iothread_stop_all().
> > >
> > > Note that the assertion failure is a race condition. It is not possible
> > > to reproduce it reliably.
> > >
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> >
> > Does pausing the vcpus actually make sure that the iothread isn't active
> > any more, or do we still have a small window where the vcpu is already
> > stopped, but the iothread is still processing requests?
> >
> > Essentially, I think the bdrv_set_aio_context() in iothread_stop_all()
> > does either not have any effect, or if it does have an effect, it's
> > wrong. You can't just force an in-use BDS into a different AioContext
> > when the user that set the AioContext is still there.
> >
> > At the very least, do we need a blk_drain_all() before stopping the
> > iothreads?
>
> bdrv_set_aio_context() contains aio_disable_external() +
> bdrv_parent_drained_begin() + bdrv_drain(bs). This should complete all
> requests, even those sitting in a descriptor ring that hasn't been
> processed yet.
Ah, yes. Not very obvious, so I wouldn't mind a comment, but you can
have my R-b either way then:
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
> > It would still just be a hack, the proper way seens to be
> > getting the virtio device out of dataplane mode so that the iothread is
> > actually unused and doesn't just happen to not process something at the
> > moment.
>
> Agreed, the existing approach is a hack. I'm not keen on implementing
> a proper device<->IOThread detach operation because vl.c:main() seems to
> be the only place that needs it - and it can get away with just
> quiescing requests and the IOThread instead.
As long as we don't want to switch devices between iothreads at runtime
(and create/delete iothreads over QMP), we probably won't really need
it, yes.
Kevin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [PATCH] vl: pause vcpus before stopping iothreads
2018-01-31 14:31 ` Kevin Wolf
@ 2018-02-01 11:07 ` Stefan Hajnoczi
0 siblings, 0 replies; 5+ messages in thread
From: Stefan Hajnoczi @ 2018-02-01 11:07 UTC (permalink / raw)
To: Kevin Wolf; +Cc: qemu-devel, qemu-block, Fam Zheng
[-- Attachment #1: Type: text/plain, Size: 2345 bytes --]
On Wed, Jan 31, 2018 at 03:31:27PM +0100, Kevin Wolf wrote:
> Am 31.01.2018 um 14:56 hat Stefan Hajnoczi geschrieben:
> > On Tue, Jan 30, 2018 at 05:54:56PM +0100, Kevin Wolf wrote:
> > > Am 30.01.2018 um 16:38 hat Stefan Hajnoczi geschrieben:
> > > > Commit dce8921b2baaf95974af8176406881872067adfa ("iothread: Stop threads
> > > > before main() quits") introduced iothread_stop_all() to avoid the
> > > > following virtio-scsi assertion failure:
> > > >
> > > > assert(blk_get_aio_context(d->conf.blk) == s->ctx);
> > > >
> > > > Back then the assertion failed because when bdrv_close_all() made
> > > > d->conf.blk NULL, blk_get_aio_context() returned the global AioContext
> > > > instead of s->ctx.
> > > >
> > > > The same assertion can still fail today when vcpus submit new I/O
> > > > requests after iothread_stop_all() has moved the BDS to the global
> > > > AioContext.
> > > >
> > > > This patch hardens the iothread_stop_all() approach by pausing vcpus
> > > > before calling iothread_stop_all().
> > > >
> > > > Note that the assertion failure is a race condition. It is not possible
> > > > to reproduce it reliably.
> > > >
> > > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > >
> > > Does pausing the vcpus actually make sure that the iothread isn't active
> > > any more, or do we still have a small window where the vcpu is already
> > > stopped, but the iothread is still processing requests?
> > >
> > > Essentially, I think the bdrv_set_aio_context() in iothread_stop_all()
> > > does either not have any effect, or if it does have an effect, it's
> > > wrong. You can't just force an in-use BDS into a different AioContext
> > > when the user that set the AioContext is still there.
> > >
> > > At the very least, do we need a blk_drain_all() before stopping the
> > > iothreads?
> >
> > bdrv_set_aio_context() contains aio_disable_external() +
> > bdrv_parent_drained_begin() + bdrv_drain(bs). This should complete all
> > requests, even those sitting in a descriptor ring that hasn't been
> > processed yet.
>
> Ah, yes. Not very obvious, so I wouldn't mind a comment, but you can
> have my R-b either way then:
>
> Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Thanks for the review! I have sent v2 with a comment for you to review.
Stefan
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 455 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-02-01 11:08 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-01-30 15:38 [Qemu-devel] [PATCH] vl: pause vcpus before stopping iothreads Stefan Hajnoczi
2018-01-30 16:54 ` Kevin Wolf
2018-01-31 13:56 ` Stefan Hajnoczi
2018-01-31 14:31 ` Kevin Wolf
2018-02-01 11:07 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).