All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sergio Lopez <slp@redhat.com>
To: Eric Blake <eblake@redhat.com>
Cc: kwolf@redhat.com, qemu-devel@nongnu.org, qemu-block@nongnu.org
Subject: Re: [Qemu-devel] [PATCH] nbd/server: attach client channel to the export's AioContext
Date: Thu, 12 Sep 2019 08:37:26 +0200	[thread overview]
Message-ID: <871rwmxbo9.fsf@redhat.com> (raw)
In-Reply-To: <6755b34b-b412-9e63-8d25-b7662d0d3860@redhat.com>

[-- Attachment #1: Type: text/plain, Size: 3687 bytes --]


Eric Blake <eblake@redhat.com> writes:

> On 9/11/19 12:21 PM, Eric Blake wrote:
>> On 9/11/19 11:15 AM, Sergio Lopez wrote:
>>> On creation, the export's AioContext is set to the same one as the
>>> BlockBackend, while the AioContext in the client QIOChannel is left
>>> untouched.
>>>
>>> As a result, when using data-plane, nbd_client_receive_next_request()
>>> schedules coroutines in the IOThread AioContext, while the client's
>>> QIOChannel is serviced from the main_loop, potentially triggering the
>>> assertion at qio_channel_restart_[read|write].
>>>
>>> To fix this, as soon we have the export corresponding to the client,
>>> we call qio_channel_attach_aio_context() to attach the QIOChannel
>>> context to the export's AioContext. This matches with the logic in
>>> blk_aio_attached().
>>>
>>> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1748253
>>> Signed-off-by: Sergio Lopez <slp@redhat.com>
>>> ---
>>>  nbd/server.c | 2 ++
>>>  1 file changed, 2 insertions(+)
>> 
>> I'd like a second opinion from Kevin, but the description makes sense to
>> me.  I'm happy to queue this through my NBD tree.
>> 
>> Reviewed-by: Eric Blake <eblake@redhat.com>
>
> I tried to test this patch, but even with it applied, I still got an
> aio-context crasher by attempting an nbd-server-start, nbd-server-add,
> nbd-server-stop (intentionally skipping the nbd-server-remove step) on a
> domain using iothreads, with a backtrace of:
>
> #0  0x00007ff09d070e35 in raise () from target:/lib64/libc.so.6
> #1  0x00007ff09d05b895 in abort () from target:/lib64/libc.so.6
> #2  0x000055dd03b9ab86 in error_exit (err=1, msg=0x55dd03d59fb0
> <__func__.15769> "qemu_mutex_unlock_impl")
>     at util/qemu-thread-posix.c:36
> #3  0x000055dd03b9adcf in qemu_mutex_unlock_impl (mutex=0x55dd062d5090,
> file=0x55dd03d59041 "util/async.c",
>     line=523) at util/qemu-thread-posix.c:96
> #4  0x000055dd03b93433 in aio_context_release (ctx=0x55dd062d5030) at
> util/async.c:523
> #5  0x000055dd03ac421b in bdrv_do_drained_begin (bs=0x55dd0673a2d0,
> recursive=false, parent=0x0,
>     ignore_bds_parents=false, poll=true) at block/io.c:428
> #6  0x000055dd03ac4299 in bdrv_drained_begin (bs=0x55dd0673a2d0) at
> block/io.c:434
> #7  0x000055dd03aafb54 in blk_drain (blk=0x55dd06a3ec40) at
> block/block-backend.c:1605
> #8  0x000055dd03aae054 in blk_remove_bs (blk=0x55dd06a3ec40) at
> block/block-backend.c:800
> #9  0x000055dd03aad54a in blk_delete (blk=0x55dd06a3ec40) at
> block/block-backend.c:420
> #10 0x000055dd03aad7d6 in blk_unref (blk=0x55dd06a3ec40) at
> block/block-backend.c:475
> #11 0x000055dd03aecb68 in nbd_export_put (exp=0x55dd0726f920) at
> nbd/server.c:1666
> #12 0x000055dd03aec8fe in nbd_export_close (exp=0x55dd0726f920) at
> nbd/server.c:1616
> #13 0x000055dd03aecbf1 in nbd_export_close_all () at nbd/server.c:1689
> #14 0x000055dd03748845 in qmp_nbd_server_stop (errp=0x7ffcdf3cb4e8) at
> blockdev-nbd.c:233
> ...
>
> Does that sound familiar to what you were seeing?  Does it mean we
> missed another spot where the context is set incorrectly?

It looks like it was trying to release the AioContext while it was still
held by some other thread. Is this stacktrace from the main thread or an
iothread? What was the other one doing?

> I'm happy to work with you on IRC for more real-time debugging of this
> (I'm woefully behind on understanding how aio contexts are supposed to
> work).

I must be missing some step, because I can't reproduce this one
here. I've tried both with an idle NDB server and one with a client
generating I/O. Is it reproducible 100% of them time?

Sergio.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

  parent reply	other threads:[~2019-09-12  6:38 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11 16:15 [Qemu-devel] [PATCH] nbd/server: attach client channel to the export's AioContext Sergio Lopez
2019-09-11 17:21 ` Eric Blake
2019-09-11 21:33   ` Eric Blake
2019-09-11 22:03     ` Eric Blake
2019-09-12  6:37     ` Sergio Lopez [this message]
2019-09-16 21:11       ` Eric Blake
2019-09-12  8:14     ` Kevin Wolf
2019-09-12 10:30       ` Sergio Lopez
2019-09-12 11:31         ` Kevin Wolf
2019-09-16 22:30           ` Eric Blake
2019-09-12  8:23 ` Kevin Wolf
2019-09-12 10:13   ` Sergio Lopez
2019-09-12 10:25     ` Kevin Wolf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=871rwmxbo9.fsf@redhat.com \
    --to=slp@redhat.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.