qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] block/rbd: Do not use BDS's AioContext
@ 2025-02-12  9:32 Hanna Czenczek
  2025-02-12 13:26 ` Kevin Wolf
  0 siblings, 1 reply; 4+ messages in thread
From: Hanna Czenczek @ 2025-02-12  9:32 UTC (permalink / raw)
  To: qemu-block
  Cc: qemu-devel, Hanna Czenczek, Ilya Dryomov, Peter Lieven,
	Kevin Wolf, Stefan Hajnoczi

RBD schedules the request completion code (qemu_rbd_finish_bh()) to run
in the BDS's AioContext.  The intent seems to be to run it in the same
context that the original request coroutine ran in, i.e. the thread on
whose stack the RBDTask object exists (see qemu_rbd_start_co()).

However, with multiqueue, that thread is not necessarily the same as the
BDS's AioContext.  Instead, we need to remember the actual AioContext
and schedule the completion BH there.

Buglink: https://issues.redhat.com/browse/RHEL-67115
Reported-by: Junyao Zhao <junzhao@redhat.com>
Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
---
I think I could also drop RBDTask.ctx and just use
`qemu_coroutine_get_aio_context(RBDTask.co)` instead, but this is the
version of the patch that was tested and confirmed to fix the issue (I
don't have a local reproducer), so I thought I'll post this first.
---
 block/rbd.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index af984fb7db..9d4e0817e0 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -102,7 +102,7 @@ typedef struct BDRVRBDState {
 } BDRVRBDState;
 
 typedef struct RBDTask {
-    BlockDriverState *bs;
+    AioContext *ctx;
     Coroutine *co;
     bool complete;
     int64_t ret;
@@ -1269,8 +1269,7 @@ static void qemu_rbd_completion_cb(rbd_completion_t c, RBDTask *task)
 {
     task->ret = rbd_aio_get_return_value(c);
     rbd_aio_release(c);
-    aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs),
-                            qemu_rbd_finish_bh, task);
+    aio_bh_schedule_oneshot(task->ctx, qemu_rbd_finish_bh, task);
 }
 
 static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
@@ -1281,7 +1280,10 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
                                           RBDAIOCmd cmd)
 {
     BDRVRBDState *s = bs->opaque;
-    RBDTask task = { .bs = bs, .co = qemu_coroutine_self() };
+    RBDTask task = {
+        .ctx = qemu_get_current_aio_context(),
+        .co = qemu_coroutine_self(),
+    };
     rbd_completion_t c;
     int r;
 
-- 
2.48.1



^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] block/rbd: Do not use BDS's AioContext
  2025-02-12  9:32 [PATCH] block/rbd: Do not use BDS's AioContext Hanna Czenczek
@ 2025-02-12 13:26 ` Kevin Wolf
  2025-02-12 14:27   ` Hanna Czenczek
  0 siblings, 1 reply; 4+ messages in thread
From: Kevin Wolf @ 2025-02-12 13:26 UTC (permalink / raw)
  To: Hanna Czenczek
  Cc: qemu-block, qemu-devel, Ilya Dryomov, Peter Lieven,
	Stefan Hajnoczi

Am 12.02.2025 um 10:32 hat Hanna Czenczek geschrieben:
> RBD schedules the request completion code (qemu_rbd_finish_bh()) to run
> in the BDS's AioContext.  The intent seems to be to run it in the same
> context that the original request coroutine ran in, i.e. the thread on
> whose stack the RBDTask object exists (see qemu_rbd_start_co()).
> 
> However, with multiqueue, that thread is not necessarily the same as the
> BDS's AioContext.  Instead, we need to remember the actual AioContext
> and schedule the completion BH there.
> 
> Buglink: https://issues.redhat.com/browse/RHEL-67115

Please add a short summary of what actually happens to the commit
message. I had to check the link to remember what the symptoms are.

> Reported-by: Junyao Zhao <junzhao@redhat.com>
> Signed-off-by: Hanna Czenczek <hreitz@redhat.com>
> ---
> I think I could also drop RBDTask.ctx and just use
> `qemu_coroutine_get_aio_context(RBDTask.co)` instead, but this is the
> version of the patch that was tested and confirmed to fix the issue (I
> don't have a local reproducer), so I thought I'll post this first.

Did  you figure out why it even makes a difference in which thread
qemu_rbd_finish_bh() runs? For context:

    static void qemu_rbd_finish_bh(void *opaque)
    {
        RBDTask *task = opaque;
        task->complete = true;
        aio_co_wake(task->co);
    }

This looks as if it should be working in any thread, except maybe for a
missing barrier after updating task->complete - but I think the failure
mode for that would be a hang in qemu_rbd_start_co().

>  block/rbd.c | 10 ++++++----
>  1 file changed, 6 insertions(+), 4 deletions(-)
> 
> diff --git a/block/rbd.c b/block/rbd.c
> index af984fb7db..9d4e0817e0 100644
> --- a/block/rbd.c
> +++ b/block/rbd.c
> @@ -102,7 +102,7 @@ typedef struct BDRVRBDState {
>  } BDRVRBDState;
>  
>  typedef struct RBDTask {
> -    BlockDriverState *bs;
> +    AioContext *ctx;
>      Coroutine *co;
>      bool complete;
>      int64_t ret;
> @@ -1269,8 +1269,7 @@ static void qemu_rbd_completion_cb(rbd_completion_t c, RBDTask *task)
>  {
>      task->ret = rbd_aio_get_return_value(c);
>      rbd_aio_release(c);
> -    aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs),
> -                            qemu_rbd_finish_bh, task);
> +    aio_bh_schedule_oneshot(task->ctx, qemu_rbd_finish_bh, task);
>  }
>  
>  static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
> @@ -1281,7 +1280,10 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>                                            RBDAIOCmd cmd)
>  {
>      BDRVRBDState *s = bs->opaque;
> -    RBDTask task = { .bs = bs, .co = qemu_coroutine_self() };
> +    RBDTask task = {
> +        .ctx = qemu_get_current_aio_context(),
> +        .co = qemu_coroutine_self(),
> +    };
>      rbd_completion_t c;
>      int r;

Nothing wrong I can see about the change, but I don't understand why it
fixes the problem.

Kevin



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] block/rbd: Do not use BDS's AioContext
  2025-02-12 13:26 ` Kevin Wolf
@ 2025-02-12 14:27   ` Hanna Czenczek
  2025-02-18  5:26     ` Stefan Hajnoczi
  0 siblings, 1 reply; 4+ messages in thread
From: Hanna Czenczek @ 2025-02-12 14:27 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: qemu-block, qemu-devel, Ilya Dryomov, Stefan Hajnoczi

[-- Attachment #1: Type: text/plain, Size: 4287 bytes --]

On 12.02.25 14:26, Kevin Wolf wrote:
> Am 12.02.2025 um 10:32 hat Hanna Czenczek geschrieben:
>> RBD schedules the request completion code (qemu_rbd_finish_bh()) to run
>> in the BDS's AioContext.  The intent seems to be to run it in the same
>> context that the original request coroutine ran in, i.e. the thread on
>> whose stack the RBDTask object exists (see qemu_rbd_start_co()).
>>
>> However, with multiqueue, that thread is not necessarily the same as the
>> BDS's AioContext.  Instead, we need to remember the actual AioContext
>> and schedule the completion BH there.
>>
>> Buglink:https://issues.redhat.com/browse/RHEL-67115
> Please add a short summary of what actually happens to the commit
> message. I had to check the link to remember what the symptoms are.

Sure.  The problem is, I don’t know exactly what’s going wrong (looked 
like a coroutine being rescheduled after it was already done), and I 
don’t know how this fixes it.

>> Reported-by: Junyao Zhao<junzhao@redhat.com>
>> Signed-off-by: Hanna Czenczek<hreitz@redhat.com>
>> ---
>> I think I could also drop RBDTask.ctx and just use
>> `qemu_coroutine_get_aio_context(RBDTask.co)` instead, but this is the
>> version of the patch that was tested and confirmed to fix the issue (I
>> don't have a local reproducer), so I thought I'll post this first.
> Did  you figure out why it even makes a difference in which thread
> qemu_rbd_finish_bh() runs? For context:
>
>      static void qemu_rbd_finish_bh(void *opaque)
>      {
>          RBDTask *task = opaque;
>          task->complete = true;
>          aio_co_wake(task->co);
>      }
>
> This looks as if it should be working in any thread, except maybe for a
> missing barrier after updating task->complete - but I think the failure
> mode for that would be a hang in qemu_rbd_start_co().

Yes, I thought the same thing.  All I could imagine was that maybe 
reading task->co returns the wrong result, but given how long ago that 
must have been set, it seems quite unlikely (to say the least).  In 
addition, qemu_rbd_completion_cb() already reads the object from a 
different thread, and that seems to work fine.

Really, all I know is that the notion of a BDS’s AioContext no longer 
makes sense in a multiqueue I/O path, so this should be scheduled in the 
I/O’s AioContext (just conceptually speaking), and that this seems to 
fix the bug.

>>   block/rbd.c | 10 ++++++----
>>   1 file changed, 6 insertions(+), 4 deletions(-)
>>
>> diff --git a/block/rbd.c b/block/rbd.c
>> index af984fb7db..9d4e0817e0 100644
>> --- a/block/rbd.c
>> +++ b/block/rbd.c
>> @@ -102,7 +102,7 @@ typedef struct BDRVRBDState {
>>   } BDRVRBDState;
>>   
>>   typedef struct RBDTask {
>> -    BlockDriverState *bs;
>> +    AioContext *ctx;
>>       Coroutine *co;
>>       bool complete;
>>       int64_t ret;
>> @@ -1269,8 +1269,7 @@ static void qemu_rbd_completion_cb(rbd_completion_t c, RBDTask *task)
>>   {
>>       task->ret = rbd_aio_get_return_value(c);
>>       rbd_aio_release(c);
>> -    aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs),
>> -                            qemu_rbd_finish_bh, task);
>> +    aio_bh_schedule_oneshot(task->ctx, qemu_rbd_finish_bh, task);
>>   }
>>   
>>   static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>> @@ -1281,7 +1280,10 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
>>                                             RBDAIOCmd cmd)
>>   {
>>       BDRVRBDState *s = bs->opaque;
>> -    RBDTask task = { .bs = bs, .co = qemu_coroutine_self() };
>> +    RBDTask task = {
>> +        .ctx = qemu_get_current_aio_context(),
>> +        .co = qemu_coroutine_self(),
>> +    };
>>       rbd_completion_t c;
>>       int r;
> Nothing wrong I can see about the change, but I don't understand why it
> fixes the problem.

Me neither.  But if this patch had been part of one of the original 
multiqueue series (without pointing out the linked bug), would there 
have been any argument against it?

Indeed it is a problem that I don’t understand what’s happening. But 
even more honestly, I’ll have to admit I can’t ever claim to understand 
what’s happening in a multi-threaded asynchronous C environment; even 
more so when the reproducer is installing Windows on RBD.

Hanna

[-- Attachment #2: Type: text/html, Size: 5488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] block/rbd: Do not use BDS's AioContext
  2025-02-12 14:27   ` Hanna Czenczek
@ 2025-02-18  5:26     ` Stefan Hajnoczi
  0 siblings, 0 replies; 4+ messages in thread
From: Stefan Hajnoczi @ 2025-02-18  5:26 UTC (permalink / raw)
  To: Hanna Czenczek; +Cc: Kevin Wolf, qemu-block, qemu-devel, Ilya Dryomov

[-- Attachment #1: Type: text/plain, Size: 5117 bytes --]

On Wed, Feb 12, 2025 at 03:27:26PM +0100, Hanna Czenczek wrote:
> On 12.02.25 14:26, Kevin Wolf wrote:
> > Am 12.02.2025 um 10:32 hat Hanna Czenczek geschrieben:
> > > RBD schedules the request completion code (qemu_rbd_finish_bh()) to run
> > > in the BDS's AioContext.  The intent seems to be to run it in the same
> > > context that the original request coroutine ran in, i.e. the thread on
> > > whose stack the RBDTask object exists (see qemu_rbd_start_co()).
> > > 
> > > However, with multiqueue, that thread is not necessarily the same as the
> > > BDS's AioContext.  Instead, we need to remember the actual AioContext
> > > and schedule the completion BH there.
> > > 
> > > Buglink:https://issues.redhat.com/browse/RHEL-67115
> > Please add a short summary of what actually happens to the commit
> > message. I had to check the link to remember what the symptoms are.
> 
> Sure.  The problem is, I don’t know exactly what’s going wrong (looked like
> a coroutine being rescheduled after it was already done), and I don’t know
> how this fixes it.
> 
> > > Reported-by: Junyao Zhao<junzhao@redhat.com>
> > > Signed-off-by: Hanna Czenczek<hreitz@redhat.com>
> > > ---
> > > I think I could also drop RBDTask.ctx and just use
> > > `qemu_coroutine_get_aio_context(RBDTask.co)` instead, but this is the
> > > version of the patch that was tested and confirmed to fix the issue (I
> > > don't have a local reproducer), so I thought I'll post this first.
> > Did  you figure out why it even makes a difference in which thread
> > qemu_rbd_finish_bh() runs? For context:
> > 
> >      static void qemu_rbd_finish_bh(void *opaque)
> >      {
> >          RBDTask *task = opaque;
> >          task->complete = true;
> >          aio_co_wake(task->co);
> >      }
> > 
> > This looks as if it should be working in any thread, except maybe for a
> > missing barrier after updating task->complete - but I think the failure
> > mode for that would be a hang in qemu_rbd_start_co().
> 
> Yes, I thought the same thing.  All I could imagine was that maybe reading
> task->co returns the wrong result, but given how long ago that must have
> been set, it seems quite unlikely (to say the least).  In addition,
> qemu_rbd_completion_cb() already reads the object from a different thread,
> and that seems to work fine.
> 
> Really, all I know is that the notion of a BDS’s AioContext no longer makes
> sense in a multiqueue I/O path, so this should be scheduled in the I/O’s
> AioContext (just conceptually speaking), and that this seems to fix the bug.

Scheduling the BH directly in the task's AioContext should help
performance, so I think this patch makes sense irrespective of the bug.
Performance results would be needed to justify that though.

With regards to the bug, I'm also not sure where the issue lies. The
root cause needs to be understood so that we can be sure that this patch
is really a fix and not just papering over a bug that is still reachable
in other ways.

> 
> > >   block/rbd.c | 10 ++++++----
> > >   1 file changed, 6 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/block/rbd.c b/block/rbd.c
> > > index af984fb7db..9d4e0817e0 100644
> > > --- a/block/rbd.c
> > > +++ b/block/rbd.c
> > > @@ -102,7 +102,7 @@ typedef struct BDRVRBDState {
> > >   } BDRVRBDState;
> > >   typedef struct RBDTask {
> > > -    BlockDriverState *bs;
> > > +    AioContext *ctx;
> > >       Coroutine *co;
> > >       bool complete;
> > >       int64_t ret;
> > > @@ -1269,8 +1269,7 @@ static void qemu_rbd_completion_cb(rbd_completion_t c, RBDTask *task)
> > >   {
> > >       task->ret = rbd_aio_get_return_value(c);
> > >       rbd_aio_release(c);
> > > -    aio_bh_schedule_oneshot(bdrv_get_aio_context(task->bs),
> > > -                            qemu_rbd_finish_bh, task);
> > > +    aio_bh_schedule_oneshot(task->ctx, qemu_rbd_finish_bh, task);
> > >   }
> > >   static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
> > > @@ -1281,7 +1280,10 @@ static int coroutine_fn qemu_rbd_start_co(BlockDriverState *bs,
> > >                                             RBDAIOCmd cmd)
> > >   {
> > >       BDRVRBDState *s = bs->opaque;
> > > -    RBDTask task = { .bs = bs, .co = qemu_coroutine_self() };
> > > +    RBDTask task = {
> > > +        .ctx = qemu_get_current_aio_context(),
> > > +        .co = qemu_coroutine_self(),
> > > +    };
> > >       rbd_completion_t c;
> > >       int r;
> > Nothing wrong I can see about the change, but I don't understand why it
> > fixes the problem.
> 
> Me neither.  But if this patch had been part of one of the original
> multiqueue series (without pointing out the linked bug), would there have
> been any argument against it?
> 
> Indeed it is a problem that I don’t understand what’s happening. But even
> more honestly, I’ll have to admit I can’t ever claim to understand what’s
> happening in a multi-threaded asynchronous C environment; even more so when
> the reproducer is installing Windows on RBD.
> 
> Hanna

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-02-18  5:27 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-12  9:32 [PATCH] block/rbd: Do not use BDS's AioContext Hanna Czenczek
2025-02-12 13:26 ` Kevin Wolf
2025-02-12 14:27   ` Hanna Czenczek
2025-02-18  5:26     ` Stefan Hajnoczi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).