From: Stefan Hajnoczi <stefanha@gmail.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: "Wei Wang" <wei.w.wang@intel.com>,
virtio-dev@lists.oasis-open.org,
"Michael S. Tsirkin" <mst@redhat.com>,
zhiyong.yang@intel.com, "Jan Kiszka" <jan.kiszka@siemens.com>,
"Jason Wang" <jasowang@redhat.com>,
avi.cohen@huawei.com, qemu-devel <qemu-devel@nongnu.org>,
"Marc-André Lureau" <marcandre.lureau@redhat.com>,
"Paolo Bonzini" <pbonzini@redhat.com>
Subject: Re: [Qemu-devel] [virtio-dev] [PATCH v3 0/7] Vhost-pci for inter-VM communication
Date: Wed, 6 Dec 2017 16:13:48 +0000 [thread overview]
Message-ID: <CAJSP0QVGLRnm8xQPLomJ5xdO1mTO3mgTx9VMQutw+gxyV8UTFw@mail.gmail.com> (raw)
In-Reply-To: <20171206134957.GD12584@stefanha-x1.localdomain>
On Wed, Dec 6, 2017 at 1:49 PM, Stefan Hajnoczi <stefanha@redhat.com> wrote:
> On Tue, Dec 05, 2017 at 11:33:09AM +0800, Wei Wang wrote:
>> Vhost-pci is a point-to-point based inter-VM communication solution. This
>> patch series implements the vhost-pci-net device setup and emulation. The
>> device is implemented as a virtio device, and it is set up via the
>> vhost-user protocol to get the neessary info (e.g the memory info of the
>> remote VM, vring info).
>>
>> Currently, only the fundamental functions are implemented. More features,
>> such as MQ and live migration, will be updated in the future.
>>
>> The DPDK PMD of vhost-pci has been posted to the dpdk mailinglist here:
>> http://dpdk.org/ml/archives/dev/2017-November/082615.html
>
> I have asked questions about the scope of this feature. In particular,
> I think it's best to support all device types rather than just
> virtio-net. Here is a design document that shows how this can be
> achieved.
>
> What I'm proposing is different from the current approach:
> 1. It's a PCI adapter (see below for justification)
> 2. The vhost-user protocol is exposed by the device (not handled 100% in
> QEMU). Ultimately I think your approach would also need to do this.
Michael asked me to provide more information on the differences
between this patch series and my proposal:
My understanding of this patch series is: it adds a new virtio device
type called vhost-pci-net. The QEMU vhost-pci-net code implements the
vhost-user protocol and then exposes virtio-net-specific functionality
to the guest. This means the vhost-pci-net driver inside the guest
doesn't speak vhost-user, it speaks vhost-pci-net. Currently no
virtqueues are defined so this is a very unusual virtio device. It
also relies on a PCI BAR for shared memory access. Some vhost-user
features like multiple virtqueues, logging (migration), etc are not
supported.
This proposal takes a different approach. Instead of create a new
virtio device type (e.g. vhost-pci-net) for each device type (e.g.
virtio-net, virtio-scsi, virtio-blk), it defines a vhost-pci PCI
adapter that allows the guest to speak the vhost-user protocol. The
vhost-pci device maps the vhost-user protocol to a PCI adapter so that
software running inside the guest can basically speak the vhost-user
protocol. It requires less logic inside QEMU except to handle
vhost-user file descriptor passing. It allows guests to decide
whether logging (migration) and other features are supported. It
allows optimized irqfd <-> ioeventfd signalling which cannot be done
with regular virtio devices.
> I'm not implementing this and not asking you to implement it. Let's
> just use this for discussion so we can figure out what the final
> vhost-pci will look like.
>
> Please let me know what you think, Wei, Michael, and others.
>
> ---
> vhost-pci device specification
> -------------------------------
> The vhost-pci device allows guests to act as vhost-user slaves. This
> enables appliance VMs like network switches or storage targets to back
> devices in other VMs. VM-to-VM communication is possible without
> vmexits using polling mode drivers.
>
> The vhost-user protocol has been used to implement virtio devices in
> userspace processes on the host. vhost-pci maps the vhost-user protocol
> to a PCI adapter so guest software can perform virtio device emulation.
> This is useful in environments where high-performance VM-to-VM
> communication is necessary or where it is preferrable to deploy device
> emulation as VMs instead of host userspace processes.
>
> The vhost-user protocol involves file descriptor passing and shared
> memory. This precludes vhost-user slave implementations over
> virtio-vsock, virtio-serial, or TCP/IP. Therefore a new device type is
> needed to expose the vhost-user protocol to guests.
>
> The vhost-pci PCI adapter has the following resources:
>
> Queues (used for vhost-user protocol communication):
> 1. Master-to-slave messages
> 2. Slave-to-master messages
>
> Doorbells (used for slave->guest/master events):
> 1. Vring call (one doorbell per virtqueue)
> 2. Vring err (one doorbell per virtqueue)
> 3. Log changed
>
> Interrupts (used for guest->slave events):
> 1. Vring kick (one MSI per virtqueue)
>
> Shared Memory BARs:
> 1. Guest memory
> 2. Log
>
> Master-to-slave queue:
> The following vhost-user protocol messages are relayed from the
> vhost-user master. Each message follows the vhost-user protocol
> VhostUserMsg layout.
>
> Messages that include file descriptor passing are relayed but do not
> carry file descriptors. The relevant resources (doorbells, interrupts,
> or shared memory BARs) are initialized from the file descriptors prior
> to the message becoming available on the Master-to-Slave queue.
>
> Resources must only be used after the corresponding vhost-user message
> has been received. For example, the Vring call doorbell can only be
> used after VHOST_USER_SET_VRING_CALL becomes available on the
> Master-to-Slave queue.
>
> Messages must be processed in order.
>
> The following vhost-user protocol messages are relayed:
> * VHOST_USER_GET_FEATURES
> * VHOST_USER_SET_FEATURES
> * VHOST_USER_GET_PROTOCOL_FEATURES
> * VHOST_USER_SET_PROTOCOL_FEATURES
> * VHOST_USER_SET_OWNER
> * VHOST_USER_SET_MEM_TABLE
> The shared memory is available in the corresponding BAR.
> * VHOST_USER_SET_LOG_BASE
> The shared memory is available in the corresponding BAR.
> * VHOST_USER_SET_LOG_FD
> The logging file descriptor can be signalled through the logging
> virtqueue.
> * VHOST_USER_SET_VRING_NUM
> * VHOST_USER_SET_VRING_ADDR
> * VHOST_USER_SET_VRING_BASE
> * VHOST_USER_GET_VRING_BASE
> * VHOST_USER_SET_VRING_KICK
> This message is still needed because it may indicate only polling
> mode is supported.
> * VHOST_USER_SET_VRING_CALL
> This message is still needed because it may indicate only polling
> mode is supported.
> * VHOST_USER_SET_VRING_ERR
> * VHOST_USER_GET_QUEUE_NUM
> * VHOST_USER_SET_VRING_ENABLE
> * VHOST_USER_SEND_RARP
> * VHOST_USER_NET_SET_MTU
> * VHOST_USER_SET_SLAVE_REQ_FD
> * VHOST_USER_IOTLB_MSG
> * VHOST_USER_SET_VRING_ENDIAN
>
> Slave-to-Master queue:
> Messages added to the Slave-to-Master queue are sent to the vhost-user
> master. Each message follows the vhost-user protocol VhostUserMsg
> layout.
>
> The following vhost-user protocol messages are relayed:
>
> * VHOST_USER_SLAVE_IOTLB_MSG
>
> Theory of Operation:
> When the vhost-pci adapter is detected the queues must be set up by the
> driver. Once the driver is ready the vhost-pci device begins relaying
> vhost-user protocol messages over the Master-to-Slave queue. The driver
> must follow the vhost-user protocol specification to implement
> virtio device initialization and virtqueue processing.
>
> Notes:
> The vhost-user UNIX domain socket connects two host processes. The
> slave process interprets messages and initializes vhost-pci resources
> (doorbells, interrupts, shared memory BARs) based on them before
> relaying via the Master-to-Slave queue. All messages are relayed, even
> if they only pass a file descriptor, because the message itself may act
> as a signal (e.g. virtqueue is now enabled).
>
> vhost-pci is a PCI adapter instead of a virtio device to allow doorbells
> and interrupts to be connected to the virtio device in the master VM in
> the most efficient way possible. This means the Vring call doorbell can
> be an ioeventfd that signals an irqfd inside the host kernel without
> host userspace involvement. The Vring kick interrupt can be an irqfd
> that is signalled by the master VM's virtqueue ioeventfd.
>
> It may be possible to write a Linux vhost-pci driver that implements the
> drivers/vhost/ API. That way existing vhost drivers could work with
> vhost-pci in the kernel.
>
> Guest userspace vhost-pci drivers will be similar to QEMU's
> contrib/libvhost-user/ except they will probably use vfio to access the
> vhost-pci device directly from userspace.
>
> TODO:
> * Queue memory layout and hardware registers
> * vhost-pci-level negotiation and configuration so the hardware
> interface can be extended in the future.
> * vhost-pci <-> driver initialization procedure
> * Master<->Slave disconnected & reconnect
next prev parent reply other threads:[~2017-12-06 16:13 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-12-05 3:33 [Qemu-devel] [PATCH v3 0/7] Vhost-pci for inter-VM communication Wei Wang
2017-12-05 3:33 ` [Qemu-devel] [PATCH v3 1/7] vhost-user: share the vhost-user protocol related structures Wei Wang
2017-12-05 3:33 ` [Qemu-devel] [PATCH v3 2/7] vhost-pci-net: add vhost-pci-net Wei Wang
2017-12-05 14:59 ` [Qemu-devel] [virtio-dev] " Stefan Hajnoczi
2017-12-05 15:17 ` Michael S. Tsirkin
2017-12-05 15:55 ` Michael S. Tsirkin
2017-12-05 16:41 ` Stefan Hajnoczi
2017-12-05 16:53 ` Michael S. Tsirkin
2017-12-05 17:00 ` Cornelia Huck
2017-12-05 18:06 ` Michael S. Tsirkin
2017-12-06 10:17 ` Wei Wang
2017-12-06 12:01 ` Stefan Hajnoczi
2017-12-05 3:33 ` [Qemu-devel] [PATCH v3 3/7] virtio/virtio-pci.c: add vhost-pci-net-pci Wei Wang
2017-12-05 3:33 ` [Qemu-devel] [PATCH v3 4/7] vhost-pci-slave: add vhost-pci slave implementation Wei Wang
2017-12-05 15:56 ` Stefan Hajnoczi
2017-12-14 17:30 ` Stefan Hajnoczi
2017-12-14 17:48 ` Stefan Hajnoczi
2017-12-05 3:33 ` [Qemu-devel] [PATCH v3 5/7] vhost-user: VHOST_USER_SET_VHOST_PCI msg Wei Wang
2017-12-05 16:00 ` Stefan Hajnoczi
2017-12-06 10:32 ` [Qemu-devel] [virtio-dev] " Wei Wang
2017-12-15 12:40 ` Stefan Hajnoczi
2017-12-05 3:33 ` [Qemu-devel] [PATCH v3 6/7] vhost-pci-slave: handle VHOST_USER_SET_VHOST_PCI Wei Wang
2017-12-05 3:33 ` [Qemu-devel] [PATCH v3 7/7] virtio/vhost.c: vhost-pci needs remote gpa Wei Wang
2017-12-05 16:05 ` Stefan Hajnoczi
2017-12-06 10:46 ` [Qemu-devel] [virtio-dev] " Wei Wang
2017-12-05 4:13 ` [Qemu-devel] [PATCH v3 0/7] Vhost-pci for inter-VM communication no-reply
2017-12-05 7:01 ` [Qemu-devel] [virtio-dev] " Jason Wang
2017-12-05 7:15 ` Wei Wang
2017-12-05 7:19 ` Jason Wang
2017-12-05 8:49 ` Avi Cohen (A)
2017-12-05 10:36 ` Wei Wang
2017-12-05 14:30 ` Stefan Hajnoczi
2017-12-05 15:20 ` [Qemu-devel] " Michael S. Tsirkin
2017-12-05 16:06 ` [Qemu-devel] [virtio-dev] " Stefan Hajnoczi
2017-12-06 13:49 ` Stefan Hajnoczi
2017-12-06 16:09 ` Wang, Wei W
2017-12-06 16:27 ` Stefan Hajnoczi
2017-12-07 3:57 ` Wei Wang
2017-12-07 5:11 ` Michael S. Tsirkin
2017-12-07 5:34 ` Wei Wang
2017-12-07 6:31 ` Stefan Hajnoczi
2017-12-07 7:54 ` Avi Cohen (A)
2017-12-07 8:04 ` Stefan Hajnoczi
2017-12-07 8:31 ` Jason Wang
2017-12-07 10:24 ` [Qemu-devel] [virtio-dev] " Stefan Hajnoczi
2017-12-07 13:33 ` [Qemu-devel] " Michael S. Tsirkin
2017-12-07 9:02 ` Wei Wang
2017-12-07 13:08 ` Stefan Hajnoczi
2017-12-07 14:02 ` Michael S. Tsirkin
2017-12-07 16:29 ` Stefan Hajnoczi
2017-12-07 16:47 ` Michael S. Tsirkin
2017-12-07 17:29 ` Stefan Hajnoczi
2017-12-07 17:38 ` Michael S. Tsirkin
2017-12-07 18:28 ` Stefan Hajnoczi
2017-12-07 23:54 ` Michael S. Tsirkin
2017-12-08 6:08 ` Stefan Hajnoczi
2017-12-08 14:27 ` Michael S. Tsirkin
2017-12-08 16:15 ` Stefan Hajnoczi
2017-12-09 16:08 ` Wang, Wei W
2017-12-08 6:43 ` Wei Wang
2017-12-08 8:33 ` Stefan Hajnoczi
2017-12-09 16:23 ` Wang, Wei W
2017-12-11 11:11 ` Stefan Hajnoczi
2017-12-11 13:53 ` Wang, Wei W
2017-12-12 10:14 ` Stefan Hajnoczi
2017-12-13 8:11 ` Wei Wang
2017-12-13 12:35 ` Stefan Hajnoczi
2017-12-13 15:01 ` Michael S. Tsirkin
2017-12-13 20:08 ` Stefan Hajnoczi
2017-12-13 20:59 ` Michael S. Tsirkin
2017-12-14 15:06 ` Stefan Hajnoczi
2017-12-15 10:33 ` Wei Wang
2017-12-15 12:37 ` Stefan Hajnoczi
2017-12-13 21:50 ` Maxime Coquelin
2017-12-14 15:46 ` [Qemu-devel] [virtio-dev] " Stefan Hajnoczi
2017-12-14 16:27 ` Michael S. Tsirkin
2017-12-14 16:39 ` Maxime Coquelin
2017-12-14 16:40 ` Michael S. Tsirkin
2017-12-14 16:50 ` Maxime Coquelin
2017-12-14 18:11 ` Stefan Hajnoczi
2017-12-14 5:53 ` [Qemu-devel] " Wei Wang
2017-12-14 17:32 ` Stefan Hajnoczi
2017-12-15 9:10 ` Wei Wang
2017-12-15 12:26 ` Stefan Hajnoczi
2017-12-14 18:04 ` Stefan Hajnoczi
2017-12-15 10:33 ` [Qemu-devel] [virtio-dev] " Wei Wang
2017-12-15 12:00 ` Stefan Hajnoczi
2017-12-06 16:13 ` Stefan Hajnoczi [this message]
2017-12-19 11:35 ` [Qemu-devel] " Stefan Hajnoczi
2017-12-19 14:56 ` Michael S. Tsirkin
2017-12-19 17:05 ` Stefan Hajnoczi
2017-12-20 4:06 ` Michael S. Tsirkin
2017-12-20 6:26 ` Stefan Hajnoczi
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJSP0QVGLRnm8xQPLomJ5xdO1mTO3mgTx9VMQutw+gxyV8UTFw@mail.gmail.com \
--to=stefanha@gmail.com \
--cc=avi.cohen@huawei.com \
--cc=jan.kiszka@siemens.com \
--cc=jasowang@redhat.com \
--cc=marcandre.lureau@redhat.com \
--cc=mst@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=virtio-dev@lists.oasis-open.org \
--cc=wei.w.wang@intel.com \
--cc=zhiyong.yang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).