qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* Question about QMP and BQL
@ 2023-05-12 18:01 Fabiano Rosas
  2023-05-15 11:08 ` Markus Armbruster
  2023-05-15 13:18 ` Kevin Wolf
  0 siblings, 2 replies; 4+ messages in thread
From: Fabiano Rosas @ 2023-05-12 18:01 UTC (permalink / raw)
  To: qemu-devel, kwolf, armbru, eesposit, vsementsov

Is there a way to execute a long-standing QMP command outside of the
BQL?

The situation we're seeing is a slow query-block due to a slow system
call (fstat over NFS) causing the main thread to spend too long
holding the global mutex and locking up the vcpu thread when it goes
out of the guest for MMIO.

The call chain for QMP is:

qmp_query_block
bdrv_query_info
bdrv_block_device_info
bdrv_query_image_info
bdrv_do_query_node_info
bdrv_get_allocated_file_size
bdrv_poll_co <- Waiting with qemu_global_mutex locked

[coroutine] bdrv_co_get_allocated_file_size_entry
bdrv_co_get_allocated_file_size
raw_co_get_allocated_file_size
fstat <- SLOW!

The closest I got was moving the coroutine into a separate iothread,
unlocking the global mutex and releasing the bdrv aio_context around
aio_poll. It feels wrong though because we're technically still
operating on the block state but not holding the context.

Is there a more standard way if doing this? Is it possible at all?

Thanks


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question about QMP and BQL
  2023-05-12 18:01 Question about QMP and BQL Fabiano Rosas
@ 2023-05-15 11:08 ` Markus Armbruster
  2023-05-15 13:18 ` Kevin Wolf
  1 sibling, 0 replies; 4+ messages in thread
From: Markus Armbruster @ 2023-05-15 11:08 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, kwolf, eesposit, vsementsov

Kevin, any advice?

Fabiano Rosas <farosas@suse.de> writes:

> Is there a way to execute a long-standing QMP command outside of the
> BQL?
>
> The situation we're seeing is a slow query-block due to a slow system
> call (fstat over NFS) causing the main thread to spend too long
> holding the global mutex and locking up the vcpu thread when it goes
> out of the guest for MMIO.
>
> The call chain for QMP is:
>
> qmp_query_block
> bdrv_query_info
> bdrv_block_device_info
> bdrv_query_image_info
> bdrv_do_query_node_info
> bdrv_get_allocated_file_size
> bdrv_poll_co <- Waiting with qemu_global_mutex locked
>
> [coroutine] bdrv_co_get_allocated_file_size_entry
> bdrv_co_get_allocated_file_size
> raw_co_get_allocated_file_size
> fstat <- SLOW!
>
> The closest I got was moving the coroutine into a separate iothread,
> unlocking the global mutex and releasing the bdrv aio_context around
> aio_poll. It feels wrong though because we're technically still
> operating on the block state but not holding the context.
>
> Is there a more standard way if doing this? Is it possible at all?



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question about QMP and BQL
  2023-05-12 18:01 Question about QMP and BQL Fabiano Rosas
  2023-05-15 11:08 ` Markus Armbruster
@ 2023-05-15 13:18 ` Kevin Wolf
  2023-05-16 15:13   ` Fabiano Rosas
  1 sibling, 1 reply; 4+ messages in thread
From: Kevin Wolf @ 2023-05-15 13:18 UTC (permalink / raw)
  To: Fabiano Rosas; +Cc: qemu-devel, armbru, eesposit, vsementsov

Am 12.05.2023 um 20:01 hat Fabiano Rosas geschrieben:
> Is there a way to execute a long-standing QMP command outside of the
> BQL?
> 
> The situation we're seeing is a slow query-block due to a slow system
> call (fstat over NFS) causing the main thread to spend too long
> holding the global mutex and locking up the vcpu thread when it goes
> out of the guest for MMIO.
> 
> The call chain for QMP is:
> 
> qmp_query_block
> bdrv_query_info
> bdrv_block_device_info
> bdrv_query_image_info
> bdrv_do_query_node_info
> bdrv_get_allocated_file_size
> bdrv_poll_co <- Waiting with qemu_global_mutex locked
> 
> [coroutine] bdrv_co_get_allocated_file_size_entry
> bdrv_co_get_allocated_file_size
> raw_co_get_allocated_file_size
> fstat <- SLOW!

The first part of the right solution there should be moving fstat() to a
worker thread like we do for other requests where we care about not
blocking. See existing raw_thread_pool_submit() callers for examples.

Note that this isn't the full solution yet. QMP still has to wait for
bdrv_get_allocated_file_size() to return. bdrv_poll_co() runs a nested
event loop, but it doesn't unlock the BQL.

So the second part would be converting the block-block QMP handler to
a coroutine so that it can actually yield to the main loop, which will
then drop the BQL while waiting. We would have to be careful there to
make sure that we don't break anything because the sets of things
allowed inside and outside coroutines are different.

Kevin



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Question about QMP and BQL
  2023-05-15 13:18 ` Kevin Wolf
@ 2023-05-16 15:13   ` Fabiano Rosas
  0 siblings, 0 replies; 4+ messages in thread
From: Fabiano Rosas @ 2023-05-16 15:13 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-devel, armbru, eesposit, vsementsov

Kevin Wolf <kwolf@redhat.com> writes:

> Am 12.05.2023 um 20:01 hat Fabiano Rosas geschrieben:
>> Is there a way to execute a long-standing QMP command outside of the
>> BQL?
>> 
>> The situation we're seeing is a slow query-block due to a slow system
>> call (fstat over NFS) causing the main thread to spend too long
>> holding the global mutex and locking up the vcpu thread when it goes
>> out of the guest for MMIO.
>> 
>> The call chain for QMP is:
>> 
>> qmp_query_block
>> bdrv_query_info
>> bdrv_block_device_info
>> bdrv_query_image_info
>> bdrv_do_query_node_info
>> bdrv_get_allocated_file_size
>> bdrv_poll_co <- Waiting with qemu_global_mutex locked
>> 
>> [coroutine] bdrv_co_get_allocated_file_size_entry
>> bdrv_co_get_allocated_file_size
>> raw_co_get_allocated_file_size
>> fstat <- SLOW!
>
> The first part of the right solution there should be moving fstat() to a
> worker thread like we do for other requests where we care about not
> blocking. See existing raw_thread_pool_submit() callers for examples.
>
> Note that this isn't the full solution yet. QMP still has to wait for
> bdrv_get_allocated_file_size() to return. bdrv_poll_co() runs a nested
> event loop, but it doesn't unlock the BQL.
>
> So the second part would be converting the block-block QMP handler to
> a coroutine so that it can actually yield to the main loop, which will
> then drop the BQL while waiting. We would have to be careful there to
> make sure that we don't break anything because the sets of things
> allowed inside and outside coroutines are different.
>
> Kevin

Hi Kevin,

Thank you, this is what I was looking for. I was missing the
raw_thread_pool_submit right there under my nose!

I'll put together an RFC so we can discuss the details.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-05-16 15:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-12 18:01 Question about QMP and BQL Fabiano Rosas
2023-05-15 11:08 ` Markus Armbruster
2023-05-15 13:18 ` Kevin Wolf
2023-05-16 15:13   ` Fabiano Rosas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).