qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hanna Czenczek <hreitz@redhat.com>
To: qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, "Hanna Czenczek" <hreitz@redhat.com>,
	"Kevin Wolf" <kwolf@redhat.com>,
	"Stefan Hajnoczi" <stefanha@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Richard W . M . Jones" <rjones@redhat.com>,
	"Ilya Dryomov" <idryomov@gmail.com>,
	"Peter Lieven" <pl@dlhnet.de>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Alex Bennée" <alex.bennee@linaro.org>,
	"Fam Zheng" <fam@euphon.net>,
	"Ronnie Sahlberg" <ronniesahlberg@gmail.com>
Subject: [PATCH v2 02/19] rbd: Run co BH CB in the coroutine’s AioContext
Date: Mon, 10 Nov 2025 16:48:37 +0100	[thread overview]
Message-ID: <20251110154854.151484-3-hreitz@redhat.com> (raw)
In-Reply-To: <20251110154854.151484-1-hreitz@redhat.com>

qemu_rbd_completion_cb() schedules the request completion code
(qemu_rbd_finish_bh()) to run in the BDS’s AioContext, assuming that
this is the same thread in which qemu_rbd_start_co() runs.

To explain, this is how both latter functions interact:

In qemu_rbd_start_co():

    while (!task.complete)
        qemu_coroutine_yield();

In qemu_rbd_finish_bh():

    task->complete = true;
    aio_co_wake(task->co); // task->co is qemu_rbd_start_co()

For this interaction to work reliably, both must run in the same thread
so that qemu_rbd_finish_bh() can only run once the coroutine yields.
Otherwise, finish_bh() may run before start_co() checks task.complete,
which will result in the latter seeing .complete as true immediately and
skipping the yield altogether, even though finish_bh() still wakes it.

With multiqueue, the BDS’s AioContext is not necessarily the thread
start_co() runs in, and so finish_bh() may be scheduled to run in a
different thread than start_co().  With the right timing, this will
cause the problems described above; waking a non-yielding coroutine is
not good, as can be reproduced by putting e.g. a usleep(100000) above
the while loop in start_co() (and using multiqueue), giving finish_bh()
a much better chance at exiting before start_co() can yield.

So instead of scheduling finish_bh() in the BDS’s AioContext, schedule
finish_bh() in task->co’s AioContext.

In addition, we can get rid of task.complete altogether because we will
get woken exactly once, when the task is indeed complete, no need to
check.

(We could go further and drop the BH, running aio_co_wake() directly in
qemu_rbd_completion_cb() because we are allowed to do that even if the
coroutine isn’t yet yielding and we’re in a different thread – but the
doc comment on qemu_rbd_completion_cb() says to be careful, so I decided
not to go so far here.)

Buglink: https://issues.redhat.com/browse/RHEL-67115
Reported-by: Junyao Zhao <junzhao@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
 block/rbd.c | 12 ++++--------
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index 3611dc81cf..2a70b5a983 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -110,9 +110,7 @@ typedef struct BDRVRBDState {
 } BDRVRBDState;
 
 typedef struct RBDTask {
-    BlockDriverState *bs;
     Coroutine *co;
-    bool complete;
     int64_t ret;
 } RBDTask;
 
@@ -1309,7 +1307,6 @@ static int qemu_rbd_resize(BlockDriverState *bs, uint64_t size)
 static void qemu_rbd_finish_bh(void *opaque)
 {
     RBDTask *task = opaque;
-    task->complete = true;
     aio_co_wake(task->co);
 }
 
@@ -1326,7 +1323,7 @@ static void qemu_rbd_completion_cb(rbd_completion_t c, RBDTask *task)
 {
     task->ret = rbd_aio_get_return_value(c);
     rbd_aio_release(c);
-    aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs),
+    aio_bh_schedule_oneshot(qemu_coroutine_get_aio_context(task->co),
                             qemu_rbd_finish_bh, task);
 }
 
@@ -1338,7 +1335,7 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
                                           RBDAIOCmd cmd)
 {
     BDRVRBDState *s = bs->opaque;
-    RBDTask task = { .bs = bs, .co = qemu_coroutine_self() };
+    RBDTask task = { .co = qemu_coroutine_self() };
     rbd_completion_t c;
     int r;
 
@@ -1401,9 +1398,8 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
         return r;
     }
 
-    while (!task.complete) {
-        qemu_coroutine_yield();
-    }
+    /* Expect exactly a single wake from qemu_rbd_finish_bh() */
+    qemu_coroutine_yield();
 
     if (task.ret < 0) {
         error_report("rbd request failed: cmd %d offset %" PRIu64 " bytes %"
-- 
2.51.1



  parent reply	other threads:[~2025-11-10 15:53 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-11-10 15:48 [PATCH v2 00/19] block: Some multi-threading fixes Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 01/19] block: Note on aio_co_wake use if not yet yielding Hanna Czenczek
2025-11-10 15:48 ` Hanna Czenczek [this message]
2025-11-10 15:48 ` [PATCH v2 03/19] iscsi: Run co BH CB in the coroutine’s AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 04/19] nfs: " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 05/19] curl: Fix coroutine waking Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 06/19] gluster: Do not move coroutine into BDS context Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 07/19] nvme: Kick and check completions in " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 08/19] nvme: Fix coroutine waking Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 09/19] nvme: Note in which AioContext some functions run Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 10/19] block/io: Take reqs_lock for tracked_requests Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 11/19] qcow2: Re-initialize lock in invalidate_cache Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 12/19] qcow2: Fix cache_clean_timer Hanna Czenczek
2025-11-17 14:50   ` Kevin Wolf
2025-11-18 11:01     ` Hanna Czenczek
2025-11-18 17:06       ` Kevin Wolf
2025-11-18 17:19         ` Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 13/19] qcow2: Schedule cache-clean-timer in realtime Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 14/19] ssh: Run restart_coroutine in current AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 15/19] blkreplay: Run BH in coroutine’s AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 16/19] block: Note in which AioContext AIO CBs are called Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 17/19] iscsi: Create AIO BH in original AioContext Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 18/19] null-aio: Run CB " Hanna Czenczek
2025-11-10 15:48 ` [PATCH v2 19/19] win32-aio: Run CB in original context Hanna Czenczek
2025-11-17 15:11 ` [PATCH v2 00/19] block: Some multi-threading fixes Kevin Wolf
2025-11-17 19:11 ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20251110154854.151484-3-hreitz@redhat.com \
    --to=hreitz@redhat.com \
    --cc=alex.bennee@linaro.org \
    --cc=fam@euphon.net \
    --cc=idryomov@gmail.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=philmd@linaro.org \
    --cc=pl@dlhnet.de \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rjones@redhat.com \
    --cc=ronniesahlberg@gmail.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).