* [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() @ 2018-08-09 13:22 Fam Zheng 2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng ` (2 more replies) 0 siblings, 3 replies; 7+ messages in thread From: Fam Zheng @ 2018-08-09 13:22 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Stefan Weil, Fam Zheng, qemu-stable, pbonzini, Stefan Hajnoczi, lersek v3: Fix commit message's bug description. [Paolo] v2: Implement the fix following Paolo's idea. Testing is still in progress. Calling aio_notify_accept(iothread->ctx) from main loop when it does aio_poll(iothread->ctx, false) is a bug because it may steal the event needed by aio_poll(iothread->ctx, true) in the IOThread. This can cause IOThread hanging. Fam Zheng (2): aio-posix: Don't count ctx->notifier as progress when polling aio: Do aio_notify_accept only during blocking aio_poll util/aio-posix.c | 7 ++++--- util/aio-win32.c | 3 ++- 2 files changed, 6 insertions(+), 4 deletions(-) -- 2.17.1 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling 2018-08-09 13:22 [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng @ 2018-08-09 13:22 ` Fam Zheng 2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng 2018-08-14 2:50 ` [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng 2 siblings, 0 replies; 7+ messages in thread From: Fam Zheng @ 2018-08-09 13:22 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Stefan Weil, Fam Zheng, qemu-stable, pbonzini, Stefan Hajnoczi, lersek The same logic exists in fd polling. This change is especially important to avoid busy loop once we limit aio_notify_accept() to blocking aio_poll(). Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> --- util/aio-posix.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/util/aio-posix.c b/util/aio-posix.c index 118bf5784b..b5c7f463aa 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -494,7 +494,8 @@ static bool run_poll_handlers_once(AioContext *ctx) QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { if (!node->deleted && node->io_poll && aio_node_check(ctx, node->is_external) && - node->io_poll(node->opaque)) { + node->io_poll(node->opaque) && + node->opaque != &ctx->notifier) { progress = true; } -- 2.17.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll 2018-08-09 13:22 [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng 2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng @ 2018-08-09 13:22 ` Fam Zheng 2018-09-07 15:51 ` [Qemu-devel] [Qemu-block] " Kevin Wolf 2018-08-14 2:50 ` [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng 2 siblings, 1 reply; 7+ messages in thread From: Fam Zheng @ 2018-08-09 13:22 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Stefan Weil, Fam Zheng, qemu-stable, pbonzini, Stefan Hajnoczi, lersek An aio_notify() pairs with an aio_notify_accept(). The former should happen in the main thread or a vCPU thread, and the latter should be done in the IOThread. There is one rare case that the main thread or vCPU thread may "steal" the aio_notify() event just raised by itself, in bdrv_set_aio_context() [1]. The sequence is like this: main thread IO Thread =============================================================== bdrv_drained_begin() aio_disable_external(ctx) aio_poll(ctx, true) ctx->notify_me += 2 ... bdrv_drained_end() ... aio_notify() ... bdrv_set_aio_context() aio_poll(ctx, false) [1] aio_notify_accept(ctx) ppoll() /* Hang! */ [1] is problematic. It will clear the ctx->notifier event so that the blocked ppoll() will not return. (For the curious, this bug was noticed when booting a number of VMs simultaneously in RHV. One or two of the VMs will hit this race condition, making the VIRTIO device unresponsive to I/O commands. When it hangs, Seabios is busy waiting for a read request to complete (read MBR), right after initializing the virtio-blk-pci device, using 100% guest CPU. See also https://bugzilla.redhat.com/show_bug.cgi?id=1562750 for the original bug analysis.) aio_notify() only injects an event when ctx->notify_me is set, correspondingly aio_notify_accept() is only useful when ctx->notify_me _was_ set. Move the call to it into the "blocking" branch. This will effectively skip [1] and fix the hang. Furthermore, blocking aio_poll is only allowed on home thread (in_aio_context_home_thread), because otherwise two blocking aio_poll()'s can steal each other's ctx->notifier event and cause hanging just like described above. Cc: qemu-stable@nongnu.org Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> --- util/aio-posix.c | 4 ++-- util/aio-win32.c | 3 ++- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/util/aio-posix.c b/util/aio-posix.c index b5c7f463aa..b5c609b68b 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -591,6 +591,7 @@ bool aio_poll(AioContext *ctx, bool blocking) * so disable the optimization now. */ if (blocking) { + assert(in_aio_context_home_thread(ctx)); atomic_add(&ctx->notify_me, 2); } @@ -633,6 +634,7 @@ bool aio_poll(AioContext *ctx, bool blocking) if (blocking) { atomic_sub(&ctx->notify_me, 2); + aio_notify_accept(ctx); } /* Adjust polling time */ @@ -676,8 +678,6 @@ bool aio_poll(AioContext *ctx, bool blocking) } } - aio_notify_accept(ctx); - /* if we have any readable fds, dispatch event */ if (ret > 0) { for (i = 0; i < npfd; i++) { diff --git a/util/aio-win32.c b/util/aio-win32.c index e676a8d9b2..c58957cc4b 100644 --- a/util/aio-win32.c +++ b/util/aio-win32.c @@ -373,11 +373,12 @@ bool aio_poll(AioContext *ctx, bool blocking) ret = WaitForMultipleObjects(count, events, FALSE, timeout); if (blocking) { assert(first); + assert(in_aio_context_home_thread(ctx)); atomic_sub(&ctx->notify_me, 2); + aio_notify_accept(ctx); } if (first) { - aio_notify_accept(ctx); progress |= aio_bh_poll(ctx); first = false; } -- 2.17.1 ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll 2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng @ 2018-09-07 15:51 ` Kevin Wolf 2018-09-10 3:59 ` Fam Zheng 0 siblings, 1 reply; 7+ messages in thread From: Kevin Wolf @ 2018-09-07 15:51 UTC (permalink / raw) To: Fam Zheng Cc: qemu-devel, qemu-block, Stefan Weil, qemu-stable, Stefan Hajnoczi, pbonzini, lersek, slp Am 09.08.2018 um 15:22 hat Fam Zheng geschrieben: > Furthermore, blocking aio_poll is only allowed on home thread > (in_aio_context_home_thread), because otherwise two blocking > aio_poll()'s can steal each other's ctx->notifier event and cause > hanging just like described above. It's good to have this assertion now at least, but after digging into some bugs, I think in fact that any aio_poll() (even non-blocking) is only allowed in the home thread: At least one reason is that if you run it from a different thread, qemu_get_current_aio_context() returns the wrong AioContext in any callbacks called by aio_poll(). Anything else using TLS can have similar problems. One instance where this matters is fixed/worked around by Sergio's "util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb". We wouldn't even need that patch if we could make sure that aio_poll() is never called from the wrong thread. This would feel more robust. I'll fix the aio_poll() calls in drain (the AIO_WAIT_WHILE() ones are already fine, the rest by removing them). After that, bdrv_set_aio_context() is still problematic, but the rest should be okay. Hopefully we can use the tighter assertion then. Kevin ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [Qemu-block] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll 2018-09-07 15:51 ` [Qemu-devel] [Qemu-block] " Kevin Wolf @ 2018-09-10 3:59 ` Fam Zheng 0 siblings, 0 replies; 7+ messages in thread From: Fam Zheng @ 2018-09-10 3:59 UTC (permalink / raw) To: Kevin Wolf Cc: qemu-devel, qemu-block, Stefan Weil, qemu-stable, Stefan Hajnoczi, pbonzini, lersek, slp On Fri, 09/07 17:51, Kevin Wolf wrote: > Am 09.08.2018 um 15:22 hat Fam Zheng geschrieben: > > Furthermore, blocking aio_poll is only allowed on home thread > > (in_aio_context_home_thread), because otherwise two blocking > > aio_poll()'s can steal each other's ctx->notifier event and cause > > hanging just like described above. > > It's good to have this assertion now at least, but after digging into > some bugs, I think in fact that any aio_poll() (even non-blocking) is > only allowed in the home thread: At least one reason is that if you run > it from a different thread, qemu_get_current_aio_context() returns the > wrong AioContext in any callbacks called by aio_poll(). Anything else > using TLS can have similar problems. > > One instance where this matters is fixed/worked around by Sergio's > "util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb". We > wouldn't even need that patch if we could make sure that aio_poll() is > never called from the wrong thread. This would feel more robust. > > I'll fix the aio_poll() calls in drain (the AIO_WAIT_WHILE() ones are > already fine, the rest by removing them). After that, > bdrv_set_aio_context() is still problematic, but the rest should be > okay. Hopefully we can use the tighter assertion then. Fully agree with you. Fam ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() 2018-08-09 13:22 [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng 2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng 2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng @ 2018-08-14 2:50 ` Fam Zheng 2018-08-14 6:27 ` Paolo Bonzini 2 siblings, 1 reply; 7+ messages in thread From: Fam Zheng @ 2018-08-14 2:50 UTC (permalink / raw) To: qemu-devel Cc: qemu-block, Stefan Weil, qemu-stable, pbonzini, Stefan Hajnoczi, lersek On Thu, 08/09 21:22, Fam Zheng wrote: > v3: Fix commit message's bug description. [Paolo] If there's no objection, I'm queuing this for 3.1. Fam ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() 2018-08-14 2:50 ` [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng @ 2018-08-14 6:27 ` Paolo Bonzini 0 siblings, 0 replies; 7+ messages in thread From: Paolo Bonzini @ 2018-08-14 6:27 UTC (permalink / raw) To: Fam Zheng, qemu-devel Cc: qemu-block, Stefan Weil, qemu-stable, Stefan Hajnoczi, lersek On 14/08/2018 04:50, Fam Zheng wrote: > On Thu, 08/09 21:22, Fam Zheng wrote: >> v3: Fix commit message's bug description. [Paolo] > > If there's no objection, I'm queuing this for 3.1. Sure, thanks. Paolo ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2018-09-10 3:59 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-08-09 13:22 [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng 2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 1/2] aio-posix: Don't count ctx->notifier as progress when polling Fam Zheng 2018-08-09 13:22 ` [Qemu-devel] [PATCH v3 2/2] aio: Do aio_notify_accept only during blocking aio_poll Fam Zheng 2018-09-07 15:51 ` [Qemu-devel] [Qemu-block] " Kevin Wolf 2018-09-10 3:59 ` Fam Zheng 2018-08-14 2:50 ` [Qemu-devel] [PATCH v3 0/2] Fix aio_notify_accept() Fam Zheng 2018-08-14 6:27 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).