From: "Michael S. Tsirkin" <mst@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: den-plotnikov@yandex-team.ru, qemu-devel@nongnu.org,
qemu-block@nongnu.org, raphael.norwitz@nutanix.com
Subject: Re: [PATCH v2 2/6] vhost-user-blk: Don't reconnect during initialisation
Date: Tue, 4 May 2021 05:44:34 -0400 [thread overview]
Message-ID: <20210504053719-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <YJETcFAyQUHB13N6@merkur.fritz.box>
On Tue, May 04, 2021 at 11:27:12AM +0200, Kevin Wolf wrote:
> Am 04.05.2021 um 10:59 hat Michael S. Tsirkin geschrieben:
> > On Thu, Apr 29, 2021 at 07:13:12PM +0200, Kevin Wolf wrote:
> > > This is a partial revert of commits 77542d43149 and bc79c87bcde.
> > >
> > > Usually, an error during initialisation means that the configuration was
> > > wrong. Reconnecting won't make the error go away, but just turn the
> > > error condition into an endless loop. Avoid this and return errors
> > > again.
> >
> > So there are several possible reasons for an error:
> >
> > 1. remote restarted - we would like to reconnect,
> > this was the original use-case for reconnect.
> >
> > I am not very happy that we are killing this usecase.
>
> This patch is killing it only during initialisation, where it's quite
> unlikely compared to other cases and where the current implementation is
> rather broken. So reverting the broken feature and going back to a
> simpler correct state feels like a good idea to me.
>
> The idea is to add the "retry during initialisation" feature back on top
> of this, but it requires some more changes in the error paths so that we
> can actually distinguish different kinds of errors and don't retry when
> we already know that it can't succeed.
Okay ... let's make all this explicit in the commit log though, ok?
> > 2. qemu detected an error and closed the connection
> > looks like we try to handle that by reconnect,
> > this is something we should address.
>
> Yes, if qemu produces the error locally, retrying is useless.
>
> > 3. remote failed due to a bad command from qemu.
> > this usecase isn't well supported at the moment.
> >
> > How about supporting it on the remote side? I think that if the
> > data is well-formed just has a configuration remote can not support
> > then instead of closing the connection, remote can wait for
> > commands with need_reply set, and respond with an error. Or at
> > least do it if VHOST_USER_PROTOCOL_F_REPLY_ACK has been negotiated.
> > If VHOST_USER_SET_VRING_ERR is used then signalling that fd might
> > also be reasonable.
> >
> > OTOH if qemu is buggy and sends malformed data and remote detects
> > that then hacing qemu retry forever is ok, might actually be
> > benefitial for debugging.
>
> I haven't really checked this case yet, it seems to be less common.
> Explicitly communicating an error is certainly better than just cutting
> the connection. But as you say, it means QEMU is buggy, so blindly
> retrying in this case is kind of acceptable.
>
> Raphael suggested that we could limit the number of retries during
> initialisation so that it wouldn't result in a hang at least.
not sure how do I feel about random limits ... how would we
set the limit?
> > > Additionally, calling vhost_user_blk_disconnect() from the chardev event
> > > handler could result in use-after-free because none of the
> > > initialisation code expects that the device could just go away in the
> > > middle. So removing the call fixes crashes in several places.
> > > For example, using a num-queues setting that is incompatible with the
> > > backend would result in a crash like this (dereferencing dev->opaque,
> > > which is already NULL):
> > >
> > > #0 0x0000555555d0a4bd in vhost_user_read_cb (source=0x5555568f4690, condition=(G_IO_IN | G_IO_HUP), opaque=0x7fffffffcbf0) at ../hw/virtio/vhost-user.c:313
> > > #1 0x0000555555d950d3 in qio_channel_fd_source_dispatch (source=0x555557c3f750, callback=0x555555d0a478 <vhost_user_read_cb>, user_data=0x7fffffffcbf0) at ../io/channel-watch.c:84
> > > #2 0x00007ffff7b32a9f in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
> > > #3 0x00007ffff7b84a98 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
> > > #4 0x00007ffff7b32163 in g_main_loop_run () at /lib64/libglib-2.0.so.0
> > > #5 0x0000555555d0a724 in vhost_user_read (dev=0x555557bc62f8, msg=0x7fffffffcc50) at ../hw/virtio/vhost-user.c:402
> > > #6 0x0000555555d0ee6b in vhost_user_get_config (dev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost-user.c:2133
> > > #7 0x0000555555d56d46 in vhost_dev_get_config (hdev=0x555557bc62f8, config=0x555557bc62ac "", config_len=60) at ../hw/virtio/vhost.c:1566
> > > #8 0x0000555555cdd150 in vhost_user_blk_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcf90) at ../hw/block/vhost-user-blk.c:510
> > > #9 0x0000555555d08f6d in virtio_device_realize (dev=0x555557bc60b0, errp=0x7fffffffcff0) at ../hw/virtio/virtio.c:3660
> >
> > Right. So that's definitely something to fix.
> >
> > >
> > > Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>
> Kevin
next prev parent reply other threads:[~2021-05-04 9:45 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-04-29 17:13 [PATCH v2 0/6] vhost-user-blk: Error handling fixes during initialistion Kevin Wolf
2021-04-29 17:13 ` [PATCH v2 1/6] vhost-user-blk: Make sure to set Error on realize failure Kevin Wolf
2021-05-03 17:12 ` Eric Blake
2021-05-03 17:24 ` Raphael Norwitz
2021-04-29 17:13 ` [PATCH v2 2/6] vhost-user-blk: Don't reconnect during initialisation Kevin Wolf
2021-05-03 17:01 ` Raphael Norwitz
2021-05-04 9:10 ` Kevin Wolf
2021-05-04 8:59 ` Michael S. Tsirkin
2021-05-04 9:27 ` Kevin Wolf
2021-05-04 9:44 ` Michael S. Tsirkin [this message]
2021-05-04 10:57 ` Kevin Wolf
2021-05-04 11:08 ` Michael S. Tsirkin
2021-04-29 17:13 ` [PATCH v2 3/6] vhost-user-blk: Improve error reporting in realize Kevin Wolf
2021-04-29 17:13 ` [PATCH v2 4/6] vhost-user-blk: Get more feature flags from vhost device Kevin Wolf
2021-04-29 17:13 ` [PATCH v2 5/6] virtio: Fail if iommu_platform is requested, but unsupported Kevin Wolf
2021-04-29 17:13 ` [PATCH v2 6/6] vhost-user-blk: Check that num-queues is supported by backend Kevin Wolf
2021-05-14 12:20 ` [PATCH v2 0/6] vhost-user-blk: Error handling fixes during initialistion Michael S. Tsirkin
2021-05-14 16:24 ` Kevin Wolf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20210504053719-mutt-send-email-mst@kernel.org \
--to=mst@redhat.com \
--cc=den-plotnikov@yandex-team.ru \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=raphael.norwitz@nutanix.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).