All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Laszlo Ersek <lersek@redhat.com>
Cc: qemu-devel@nongnu.org, Peter Maydell <peter.maydell@linaro.org>,
	Eugenio Perez Martin <eperezma@redhat.com>,
	German Maglione <gmaglione@redhat.com>,
	Liu Jiang <gerry@linux.alibaba.com>,
	Sergio Lopez Pascual <slp@redhat.com>,
	Stefano Garzarella <sgarzare@redhat.com>
Subject: Re: [PULL 30/63] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously
Date: Wed, 4 Oct 2023 08:53:14 -0400	[thread overview]
Message-ID: <20231004085242-mutt-send-email-mst@kernel.org> (raw)
In-Reply-To: <d7c0aa1f-8158-1688-9ce3-d6b5b04d8ae5@redhat.com>

On Wed, Oct 04, 2023 at 12:11:44PM +0200, Laszlo Ersek wrote:
> On 10/4/23 10:44, Michael S. Tsirkin wrote:
> > From: Laszlo Ersek <lersek@redhat.com>
> > 
> > (1) The virtio-1.2 specification
> > <http://docs.oasis-open.org/virtio/virtio/v1.2/virtio-v1.2.html> writes:
> > 
> >> 3     General Initialization And Device Operation
> >> 3.1   Device Initialization
> >> 3.1.1 Driver Requirements: Device Initialization
> >>
> >> [...]
> >>
> >> 7. Perform device-specific setup, including discovery of virtqueues for
> >>    the device, optional per-bus setup, reading and possibly writing the
> >>    device’s virtio configuration space, and population of virtqueues.
> >>
> >> 8. Set the DRIVER_OK status bit. At this point the device is “live”.
> > 
> > and
> > 
> >> 4         Virtio Transport Options
> >> 4.1       Virtio Over PCI Bus
> >> 4.1.4     Virtio Structure PCI Capabilities
> >> 4.1.4.3   Common configuration structure layout
> >> 4.1.4.3.2 Driver Requirements: Common configuration structure layout
> >>
> >> [...]
> >>
> >> The driver MUST configure the other virtqueue fields before enabling the
> >> virtqueue with queue_enable.
> >>
> >> [...]
> > 
> > (The same statements are present in virtio-1.0 identically, at
> > <http://docs.oasis-open.org/virtio/virtio/v1.0/virtio-v1.0.html>.)
> > 
> > These together mean that the following sub-sequence of steps is valid for
> > a virtio-1.0 guest driver:
> > 
> > (1.1) set "queue_enable" for the needed queues as the final part of device
> > initialization step (7),
> > 
> > (1.2) set DRIVER_OK in step (8),
> > 
> > (1.3) immediately start sending virtio requests to the device.
> > 
> > (2) When vhost-user is enabled, and the VHOST_USER_F_PROTOCOL_FEATURES
> > special virtio feature is negotiated, then virtio rings start in disabled
> > state, according to
> > <https://qemu-project.gitlab.io/qemu/interop/vhost-user.html#ring-states>.
> > In this case, explicit VHOST_USER_SET_VRING_ENABLE messages are needed for
> > enabling vrings.
> > 
> > Therefore setting "queue_enable" from the guest (1.1) is a *control plane*
> > operation, which travels from the guest through QEMU to the vhost-user
> > backend, using a unix domain socket.
> > 
> > Whereas sending a virtio request (1.3) is a *data plane* operation, which
> > evades QEMU -- it travels from guest to the vhost-user backend via
> > eventfd.
> > 
> > This means that steps (1.1) and (1.3) travel through different channels,
> > and their relative order can be reversed, as perceived by the vhost-user
> > backend.
> > 
> > That's exactly what happens when OVMF's virtiofs driver (VirtioFsDxe) runs
> > against the Rust-language virtiofsd version 1.7.2. (Which uses version
> > 0.10.1 of the vhost-user-backend crate, and version 0.8.1 of the vhost
> > crate.)
> > 
> > Namely, when VirtioFsDxe binds a virtiofs device, it goes through the
> > device initialization steps (i.e., control plane operations), and
> > immediately sends a FUSE_INIT request too (i.e., performs a data plane
> > operation). In the Rust-language virtiofsd, this creates a race between
> > two components that run *concurrently*, i.e., in different threads or
> > processes:
> > 
> > - Control plane, handling vhost-user protocol messages:
> > 
> >   The "VhostUserSlaveReqHandlerMut::set_vring_enable" method
> >   [crates/vhost-user-backend/src/handler.rs] handles
> >   VHOST_USER_SET_VRING_ENABLE messages, and updates each vring's "enabled"
> >   flag according to the message processed.
> > 
> > - Data plane, handling virtio / FUSE requests:
> > 
> >   The "VringEpollHandler::handle_event" method
> >   [crates/vhost-user-backend/src/event_loop.rs] handles the incoming
> >   virtio / FUSE request, consuming the virtio kick at the same time. If
> >   the vring's "enabled" flag is set, the virtio / FUSE request is
> >   processed genuinely. If the vring's "enabled" flag is clear, then the
> >   virtio / FUSE request is discarded.
> > 
> > Note that OVMF enables the queue *first*, and sends FUSE_INIT *second*.
> > However, if the data plane processor in virtiofsd wins the race, then it
> > sees the FUSE_INIT *before* the control plane processor took notice of
> > VHOST_USER_SET_VRING_ENABLE and green-lit the queue for the data plane
> > processor. Therefore the latter drops FUSE_INIT on the floor, and goes
> > back to waiting for further virtio / FUSE requests with epoll_wait.
> > Meanwhile OVMF is stuck waiting for the FUSET_INIT response -- a deadlock.
> > 
> > The deadlock is not deterministic. OVMF hangs infrequently during first
> > boot. However, OVMF hangs almost certainly during reboots from the UEFI
> > shell.
> > 
> > The race can be "reliably masked" by inserting a very small delay -- a
> > single debug message -- at the top of "VringEpollHandler::handle_event",
> > i.e., just before the data plane processor checks the "enabled" field of
> > the vring. That delay suffices for the control plane processor to act upon
> > VHOST_USER_SET_VRING_ENABLE.
> > 
> > We can deterministically prevent the race in QEMU, by blocking OVMF inside
> > step (1.1) -- i.e., in the write to the "queue_enable" register -- until
> > VHOST_USER_SET_VRING_ENABLE actually *completes*. That way OVMF's VCPU
> > cannot advance to the FUSE_INIT submission before virtiofsd's control
> > plane processor takes notice of the queue being enabled.
> > 
> > Wait for VHOST_USER_SET_VRING_ENABLE completion by:
> > 
> > - setting the NEED_REPLY flag on VHOST_USER_SET_VRING_ENABLE, and waiting
> >   for the reply, if the VHOST_USER_PROTOCOL_F_REPLY_ACK vhost-user feature
> >   has been negotiated, or
> > 
> > - performing a separate VHOST_USER_GET_FEATURES *exchange*, which requires
> >   a backend response regardless of VHOST_USER_PROTOCOL_F_REPLY_ACK.
> > 
> > Cc: "Michael S. Tsirkin" <mst@redhat.com> (supporter:vhost)
> > Cc: Eugenio Perez Martin <eperezma@redhat.com>
> > Cc: German Maglione <gmaglione@redhat.com>
> > Cc: Liu Jiang <gerry@linux.alibaba.com>
> > Cc: Sergio Lopez Pascual <slp@redhat.com>
> > Cc: Stefano Garzarella <sgarzare@redhat.com>
> > Signed-off-by: Laszlo Ersek <lersek@redhat.com>
> > Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
> > Message-Id: <20230830134055.106812-8-lersek@redhat.com>
> > Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> > Reviewed-by: Eugenio Pérez <eperezma@redhat.com>
> > ---
> >  hw/virtio/vhost-user.c | 16 +++++++++++++++-
> >  1 file changed, 15 insertions(+), 1 deletion(-)
> > 
> > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> > index ae0734d461..eb983ae295 100644
> > --- a/hw/virtio/vhost-user.c
> > +++ b/hw/virtio/vhost-user.c
> > @@ -1214,7 +1214,21 @@ static int vhost_user_set_vring_enable(struct vhost_dev *dev, int enable)
> >              .num   = enable,
> >          };
> >  
> > -        ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, &state, false);
> > +        /*
> > +         * SET_VRING_ENABLE travels from guest to QEMU to vhost-user backend /
> > +         * control plane thread via unix domain socket. Virtio requests travel
> > +         * from guest to vhost-user backend / data plane thread via eventfd.
> > +         * Even if the guest enables the ring first, and pushes its first virtio
> > +         * request second (conforming to the virtio spec), the data plane thread
> > +         * in the backend may see the virtio request before the control plane
> > +         * thread sees the queue enablement. This causes (in fact, requires) the
> > +         * data plane thread to discard the virtio request (it arrived on a
> > +         * seemingly disabled queue). To prevent this out-of-order delivery,
> > +         * don't let the guest proceed to pushing the virtio request until the
> > +         * backend control plane acknowledges enabling the queue -- IOW, pass
> > +         * wait_for_reply=true below.
> > +         */
> > +        ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE, &state, true);
> >          if (ret < 0) {
> >              /*
> >               * Restoring the previous state is likely infeasible, as well as
> 
> This is not the latest version (v3) of this set -- please see
> <https://patchwork.ozlabs.org/project/qemu-devel/cover/20231002203221.17241-1-lersek@redhat.com/>.
> 
> Thanks,
> Laszlo

Ouch. OK I will drop. Feel free to send v4 tweaking commit message - I
think you wanted to do it anyway right?



  reply	other threads:[~2023-10-04 12:53 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-04  8:43 [PULL 00/63] virtio,pci: features, cleanups Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 01/63] pci: SLT must be RO Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 02/63] hw/virtio: Propagate page_mask to vhost_vdpa_listener_skipped_section() Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 03/63] hw/virtio: Propagate page_mask to vhost_vdpa_section_end() Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 04/63] hw/virtio/vhost-vdpa: Inline TARGET_PAGE_ALIGN() macro Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 05/63] hw/virtio/vhost-vdpa: Use target-agnostic qemu_target_page_mask() Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 06/63] hw/virtio: Build vhost-vdpa.o once Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 07/63] hw/virtio/meson: Rename softmmu_virtio_ss[] -> system_virtio_ss[] Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 08/63] virtio: add vhost-user-base and a generic vhost-user-device Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 09/63] hw/virtio: add config support to vhost-user-device Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 10/63] virtio-net: do not reset vlan filtering at set_features Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 11/63] virtio-net: Expose MAX_VLAN Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 12/63] vdpa: Restore vlan filtering state Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 13/63] vdpa: Allow VIRTIO_NET_F_CTRL_VLAN in SVQ Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 14/63] virtio: don't zero out memory region cache for indirect descriptors Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 15/63] vdpa: use first queue SVQ state for CVQ default Michael S. Tsirkin
2023-10-04  8:43 ` [PULL 16/63] vdpa: export vhost_vdpa_set_vring_ready Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 17/63] vdpa: rename vhost_vdpa_net_load to vhost_vdpa_net_cvq_load Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 18/63] vdpa: move vhost_vdpa_set_vring_ready to the caller Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 19/63] vdpa: remove net cvq migration blocker Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 20/63] vhost: Add count argument to vhost_svq_poll() Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 21/63] qmp: remove virtio_list, search QOM tree instead Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 22/63] qmp: update virtio feature maps, vhost-user-gpio introspection Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 23/63] vhost-user: move VhostUserProtocolFeature definition to header file Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 24/63] vhost-user: strip superfluous whitespace Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 25/63] vhost-user: tighten "reply_supported" scope in "set_vring_addr" Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 26/63] vhost-user: factor out "vhost_user_write_sync" Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 27/63] vhost-user: flatten "enforce_reply" into "vhost_user_write_sync" Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 28/63] vhost-user: hoist "write_sync", "get_features", "get_u64" Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 29/63] vhost-user: allow "vhost_set_vring" to wait for a reply Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 30/63] vhost-user: call VHOST_USER_SET_VRING_ENABLE synchronously Michael S. Tsirkin
2023-10-04 10:11   ` Laszlo Ersek
2023-10-04 12:53     ` Michael S. Tsirkin [this message]
2023-10-04 13:28       ` Laszlo Ersek
2023-10-04  8:44 ` [PULL 31/63] hw/isa/ich9: Add comment on imperfect emulation of PIC vs. I/O APIC routing Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 32/63] tests/acpi: Allow update of DSDT.cxl Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 33/63] hw/cxl: Add QTG _DSM support for ACPI0017 device Michael S. Tsirkin
2023-10-04 17:46   ` Thomas Huth
2023-10-04 22:14     ` Michael S. Tsirkin
2023-10-04 23:09       ` [PATCH v3] " Dave Jiang
2023-10-05  3:36         ` Michael S. Tsirkin
2023-10-05 16:11           ` Dave Jiang
2023-10-05 16:32             ` Michael S. Tsirkin
2023-10-06 12:09               ` Jonathan Cameron
2023-10-06 12:09                 ` Jonathan Cameron via
2023-10-06 17:50                 ` Dan Williams
2023-10-06 22:15                   ` [PATCH v4] " Dave Jiang
2023-10-09 15:47                     ` Jonathan Cameron
2023-10-09 15:47                       ` Jonathan Cameron via
2023-10-09 16:06                       ` Dave Jiang
2023-10-09 15:44                   ` [PATCH v3] " Jonathan Cameron
2023-10-09 15:44                     ` Jonathan Cameron via
2023-10-07 21:17                 ` Michael S. Tsirkin
2023-10-09 15:40                   ` Jonathan Cameron
2023-10-09 15:40                     ` Jonathan Cameron via
2023-10-05 17:00             ` Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 34/63] tests/acpi: Update DSDT.cxl with QTG DSM Michael S. Tsirkin
2023-10-04  8:44 ` [PULL 35/63] hw/i386/acpi-build: Use pc_madt_cpu_entry() directly Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 36/63] hw/acpi/cpu: Have build_cpus_aml() take a build_madt_cpu_fn callback Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 37/63] hw/acpi/acpi_dev_interface: Remove now unused madt_cpu virtual method Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 38/63] hw/acpi/acpi_dev_interface: Remove now unused #include "hw/boards.h" Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 39/63] hw/i386: Remove now redundant TYPE_ACPI_GED_X86 Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 40/63] hw/i386/acpi-build: Determine SMI command port just once Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 41/63] hw/acpi: Trace GPE access in all device models, not just PIIX4 Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 42/63] hw/acpi/core: Trace enable and status registers of GPE separately Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 43/63] vdpa: fix gcc cvq_isolated uninitialized variable warning Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 44/63] vdpa net: zero vhost_vdpa iova_tree pointer at cleanup Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 45/63] hw/cxl: Push cxl_decoder_count_enc() and cxl_decode_ig() into .c Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 46/63] hw/cxl: Add utility functions decoder interleave ways and target count Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 47/63] hw/cxl: Fix and use same calculation for HDM decoder block size everywhere Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 48/63] hw/cxl: Support 4 HDM decoders at all levels of topology Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 49/63] hw/pci-bridge/cxl-upstream: Add serial number extended capability support Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 50/63] vdpa net: fix error message setting virtio status Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 51/63] vdpa net: stop probing if cannot set features Michael S. Tsirkin
2023-10-04  8:45 ` [PULL 52/63] vdpa net: follow VirtIO initialization properly at cvq isolation probing Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 53/63] amd_iommu: Fix APIC address check Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 54/63] hw/i386/pc: improve physical address space bound check for 32-bit x86 systems Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 55/63] pcie_sriov: unregister_vfs(): fix error path Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 56/63] libvhost-user.c: add assertion to vu_message_read_default Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 57/63] virtio: use shadow_avail_idx while checking number of heads Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 58/63] virtio: remove unnecessary thread fence while reading next descriptor Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 59/63] virtio: remove unused next argument from virtqueue_split_read_next_desc() Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 60/63] util/uuid: add a hash function Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 61/63] hw/display: introduce virtio-dmabuf Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 62/63] vhost-user: add shared_object msg Michael S. Tsirkin
2023-10-04  8:46 ` [PULL 63/63] libvhost-user: handle " Michael S. Tsirkin
2023-10-04  8:54 ` [PULL 00/63] virtio,pci: features, cleanups Philippe Mathieu-Daudé
2023-10-04  9:08   ` Michael S. Tsirkin
2023-10-04  8:58 ` Michael S. Tsirkin
2023-10-04 16:26 ` Michael S. Tsirkin
2023-10-04 16:50 ` Stefan Hajnoczi
2023-10-04 17:04   ` Michael S. Tsirkin
2023-10-04 17:40     ` Thomas Huth
2023-10-04 22:23       ` Michael S. Tsirkin
2023-10-05  6:10         ` Thomas Huth
2023-10-04 18:23     ` Stefan Hajnoczi
2023-10-04 17:20   ` Michael S. Tsirkin
2023-10-04 18:29     ` Stefan Hajnoczi
2023-10-04 22:16       ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20231004085242-mutt-send-email-mst@kernel.org \
    --to=mst@redhat.com \
    --cc=eperezma@redhat.com \
    --cc=gerry@linux.alibaba.com \
    --cc=gmaglione@redhat.com \
    --cc=lersek@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=sgarzare@redhat.com \
    --cc=slp@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.