qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Revert "monitor: use aio_co_reschedule_self()"
@ 2024-05-06 19:06 Stefan Hajnoczi
  2024-05-06 19:06 ` [PATCH 1/2] " Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2024-05-06 19:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Markus Armbruster, Michael Roth, Kevin Wolf, Hanna Czenczek,
	Stefan Hajnoczi, qemu-block, Fam Zheng

This series fixes RHEL-34618 "qemu crash on Assertion `luringcb->co->ctx ==
s->aio_context' failed when do block_resize on hotplug disk with aio=io_uring":
https://issues.redhat.com/browse/RHEL-34618

Kevin identified commit 1f25c172f837 ("monitor: use aio_co_reschedule_self()")
as the root cause. There is a subtlety regarding how
qemu_get_current_aio_context() returns qemu_aio_context even though we may be
running in iohandler_ctx.

Revert commit 1f25c172f837, it was just intended as a code cleanup.

Stefan Hajnoczi (2):
  Revert "monitor: use aio_co_reschedule_self()"
  aio: warn about iohandler_ctx special casing

 include/block/aio.h | 6 ++++++
 qapi/qmp-dispatch.c | 7 +++++--
 2 files changed, 11 insertions(+), 2 deletions(-)

-- 
2.45.0



^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] Revert "monitor: use aio_co_reschedule_self()"
  2024-05-06 19:06 [PATCH 0/2] Revert "monitor: use aio_co_reschedule_self()" Stefan Hajnoczi
@ 2024-05-06 19:06 ` Stefan Hajnoczi
  2024-05-29 10:33   ` Fiona Ebner
  2024-05-06 19:06 ` [PATCH 2/2] aio: warn about iohandler_ctx special casing Stefan Hajnoczi
  2024-05-28 15:54 ` [PATCH 0/2] Revert "monitor: use aio_co_reschedule_self()" Kevin Wolf
  2 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2024-05-06 19:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Markus Armbruster, Michael Roth, Kevin Wolf, Hanna Czenczek,
	Stefan Hajnoczi, qemu-block, Fam Zheng

Commit 1f25c172f837 ("monitor: use aio_co_reschedule_self()") was a code
cleanup that uses aio_co_reschedule_self() instead of open coding
coroutine rescheduling.

Bug RHEL-34618 was reported and Kevin Wolf <kwolf@redhat.com> identified
the root cause. I missed that aio_co_reschedule_self() ->
qemu_get_current_aio_context() only knows about
qemu_aio_context/IOThread AioContexts and not about iohandler_ctx. It
does not function correctly when going back from the iohandler_ctx to
qemu_aio_context.

Go back to open coding the AioContext transitions to avoid this bug.

This reverts commit 1f25c172f83704e350c0829438d832384084a74d.

Buglink: https://issues.redhat.com/browse/RHEL-34618
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 qapi/qmp-dispatch.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index f3488afeef..176b549473 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -212,7 +212,8 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
              * executing the command handler so that it can make progress if it
              * involves an AIO_WAIT_WHILE().
              */
-            aio_co_reschedule_self(qemu_get_aio_context());
+            aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
+            qemu_coroutine_yield();
         }
 
         monitor_set_cur(qemu_coroutine_self(), cur_mon);
@@ -226,7 +227,9 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
              * Move back to iohandler_ctx so that nested event loops for
              * qemu_aio_context don't start new monitor commands.
              */
-            aio_co_reschedule_self(iohandler_get_aio_context());
+            aio_co_schedule(iohandler_get_aio_context(),
+                            qemu_coroutine_self());
+            qemu_coroutine_yield();
         }
     } else {
        /*
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] aio: warn about iohandler_ctx special casing
  2024-05-06 19:06 [PATCH 0/2] Revert "monitor: use aio_co_reschedule_self()" Stefan Hajnoczi
  2024-05-06 19:06 ` [PATCH 1/2] " Stefan Hajnoczi
@ 2024-05-06 19:06 ` Stefan Hajnoczi
  2024-05-28 15:54 ` [PATCH 0/2] Revert "monitor: use aio_co_reschedule_self()" Kevin Wolf
  2 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2024-05-06 19:06 UTC (permalink / raw)
  To: qemu-devel
  Cc: Markus Armbruster, Michael Roth, Kevin Wolf, Hanna Czenczek,
	Stefan Hajnoczi, qemu-block, Fam Zheng

The main loop has two AioContexts: qemu_aio_context and iohandler_ctx.
The main loop runs them both, but nested aio_poll() calls on
qemu_aio_context exclude iohandler_ctx.

Which one should qemu_get_current_aio_context() return when called from
the main loop? Document that it's always qemu_aio_context.

This has subtle effects on functions that use
qemu_get_current_aio_context(). For example, aio_co_reschedule_self()
does not work when moving from iohandler_ctx to qemu_aio_context because
qemu_get_current_aio_context() does not differentiate these two
AioContexts.

Document this in order to reduce the chance of future bugs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 include/block/aio.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/block/aio.h b/include/block/aio.h
index 8378553eb9..4ee81936ed 100644
--- a/include/block/aio.h
+++ b/include/block/aio.h
@@ -629,6 +629,9 @@ void aio_co_schedule(AioContext *ctx, Coroutine *co);
  *
  * Move the currently running coroutine to new_ctx. If the coroutine is already
  * running in new_ctx, do nothing.
+ *
+ * Note that this function cannot reschedule from iohandler_ctx to
+ * qemu_aio_context.
  */
 void coroutine_fn aio_co_reschedule_self(AioContext *new_ctx);
 
@@ -661,6 +664,9 @@ void aio_co_enter(AioContext *ctx, Coroutine *co);
  * If called from an IOThread this will be the IOThread's AioContext.  If
  * called from the main thread or with the "big QEMU lock" taken it
  * will be the main loop AioContext.
+ *
+ * Note that the return value is never the main loop's iohandler_ctx and the
+ * return value is the main loop AioContext instead.
  */
 AioContext *qemu_get_current_aio_context(void);
 
-- 
2.45.0



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] Revert "monitor: use aio_co_reschedule_self()"
  2024-05-06 19:06 [PATCH 0/2] Revert "monitor: use aio_co_reschedule_self()" Stefan Hajnoczi
  2024-05-06 19:06 ` [PATCH 1/2] " Stefan Hajnoczi
  2024-05-06 19:06 ` [PATCH 2/2] aio: warn about iohandler_ctx special casing Stefan Hajnoczi
@ 2024-05-28 15:54 ` Kevin Wolf
  2 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2024-05-28 15:54 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: qemu-devel, Markus Armbruster, Michael Roth, Hanna Czenczek,
	qemu-block, Fam Zheng

Am 06.05.2024 um 21:06 hat Stefan Hajnoczi geschrieben:
> This series fixes RHEL-34618 "qemu crash on Assertion `luringcb->co->ctx ==
> s->aio_context' failed when do block_resize on hotplug disk with aio=io_uring":
> https://issues.redhat.com/browse/RHEL-34618
> 
> Kevin identified commit 1f25c172f837 ("monitor: use aio_co_reschedule_self()")
> as the root cause. There is a subtlety regarding how
> qemu_get_current_aio_context() returns qemu_aio_context even though we may be
> running in iohandler_ctx.
> 
> Revert commit 1f25c172f837, it was just intended as a code cleanup.
> 
> Stefan Hajnoczi (2):
>   Revert "monitor: use aio_co_reschedule_self()"
>   aio: warn about iohandler_ctx special casing

Thanks, applied to the block branch.

Kevin



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] Revert "monitor: use aio_co_reschedule_self()"
  2024-05-06 19:06 ` [PATCH 1/2] " Stefan Hajnoczi
@ 2024-05-29 10:33   ` Fiona Ebner
  2024-05-29 13:20     ` Kevin Wolf
  0 siblings, 1 reply; 6+ messages in thread
From: Fiona Ebner @ 2024-05-29 10:33 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel
  Cc: Markus Armbruster, Michael Roth, Kevin Wolf, Hanna Czenczek,
	qemu-block, Fam Zheng, qemu-stable

CC-ing stable since 1f25c172f83704e350c0829438d832384084a74d is in 9.0.0

Am 06.05.24 um 21:06 schrieb Stefan Hajnoczi:
> Commit 1f25c172f837 ("monitor: use aio_co_reschedule_self()") was a code
> cleanup that uses aio_co_reschedule_self() instead of open coding
> coroutine rescheduling.
> 
> Bug RHEL-34618 was reported and Kevin Wolf <kwolf@redhat.com> identified
> the root cause. I missed that aio_co_reschedule_self() ->
> qemu_get_current_aio_context() only knows about
> qemu_aio_context/IOThread AioContexts and not about iohandler_ctx. It
> does not function correctly when going back from the iohandler_ctx to
> qemu_aio_context.
> 
> Go back to open coding the AioContext transitions to avoid this bug.
> 
> This reverts commit 1f25c172f83704e350c0829438d832384084a74d.
> 
> Buglink: https://issues.redhat.com/browse/RHEL-34618
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  qapi/qmp-dispatch.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
> index f3488afeef..176b549473 100644
> --- a/qapi/qmp-dispatch.c
> +++ b/qapi/qmp-dispatch.c
> @@ -212,7 +212,8 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
>               * executing the command handler so that it can make progress if it
>               * involves an AIO_WAIT_WHILE().
>               */
> -            aio_co_reschedule_self(qemu_get_aio_context());
> +            aio_co_schedule(qemu_get_aio_context(), qemu_coroutine_self());
> +            qemu_coroutine_yield();
>          }
>  
>          monitor_set_cur(qemu_coroutine_self(), cur_mon);
> @@ -226,7 +227,9 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
>               * Move back to iohandler_ctx so that nested event loops for
>               * qemu_aio_context don't start new monitor commands.
>               */
> -            aio_co_reschedule_self(iohandler_get_aio_context());
> +            aio_co_schedule(iohandler_get_aio_context(),
> +                            qemu_coroutine_self());
> +            qemu_coroutine_yield();
>          }
>      } else {
>         /*



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] Revert "monitor: use aio_co_reschedule_self()"
  2024-05-29 10:33   ` Fiona Ebner
@ 2024-05-29 13:20     ` Kevin Wolf
  0 siblings, 0 replies; 6+ messages in thread
From: Kevin Wolf @ 2024-05-29 13:20 UTC (permalink / raw)
  To: Fiona Ebner
  Cc: Stefan Hajnoczi, qemu-devel, Markus Armbruster, Michael Roth,
	Hanna Czenczek, qemu-block, Fam Zheng, qemu-stable

Am 29.05.2024 um 12:33 hat Fiona Ebner geschrieben:
> CC-ing stable since 1f25c172f83704e350c0829438d832384084a74d is in 9.0.0

Good point, I'm also updating the commit message in my tree to add
a Cc: line. Thanks for catching this, Fiona!

Kevin



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-05-29 13:21 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-06 19:06 [PATCH 0/2] Revert "monitor: use aio_co_reschedule_self()" Stefan Hajnoczi
2024-05-06 19:06 ` [PATCH 1/2] " Stefan Hajnoczi
2024-05-29 10:33   ` Fiona Ebner
2024-05-29 13:20     ` Kevin Wolf
2024-05-06 19:06 ` [PATCH 2/2] aio: warn about iohandler_ctx special casing Stefan Hajnoczi
2024-05-28 15:54 ` [PATCH 0/2] Revert "monitor: use aio_co_reschedule_self()" Kevin Wolf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).