qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Hajnoczi <stefanha@redhat.com>
To: Wei Wang <wei.w.wang@intel.com>
Cc: kvm@vger.kernel.org, qemu-devel@nongnu.org,
	virtio-comment@lists.oasis-open.org, mst@redhat.com,
	pbonzini@redhat.com,
	"Marc-André Lureau" <marcandre.lureau@redhat.com>
Subject: Re: [Qemu-devel] [PATCH] Vhost-pci RFC v2: a new virtio device for inter-VM communication
Date: Mon, 29 Aug 2016 11:27:07 -0400	[thread overview]
Message-ID: <20160829152707.GC18909@stefanha-x1.localdomain> (raw)
In-Reply-To: <1466345649-64841-2-git-send-email-wei.w.wang@intel.com>

[-- Attachment #1: Type: text/plain, Size: 17833 bytes --]

On Sun, Jun 19, 2016 at 10:14:09PM +0800, Wei Wang wrote:
> We introduce the vhost-pci design in the virtio specification format.
> To follow the naming conventions in the virtio specification, we call
> the VM who sends packets to the destination VM the device VM, and the
> VM who provides the vring and receives packets the driver VM.
> 
> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> ---
>  vhost-pci.patch | 341 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 341 insertions(+)
>  create mode 100755 vhost-pci.patch

Adding Marc-André on CC because vhost-pci has a natural parallel to
vhost-user.  Instead of terminating the virtio device in a host
userspace process it terminates the device in a VM.  The design lessons
from vhost-user still apply though.

Marc-André: Do you have time to review this proposal?

> diff --git a/vhost-pci.patch b/vhost-pci.patch
> new file mode 100755
> index 0000000..341ba07
> --- /dev/null
> +++ b/vhost-pci.patch
> @@ -0,0 +1,341 @@
> +1. Vhost-pci Device
> +
> +1.1 Device ID
> +TBD
> +
> +1.2 Virtqueues
> +0 control receiveq
> +1 control transmitq
> +
> +1.3 Feature Bits
> +
> +1.3.1 Local Feature Bits
> +Currently no local feature bits are defined, so the standard virtio feature
> +bits negation will always be successful and complete.
> +
> +1.3.2 Remote Feature Bits
> +The remote feature bits are obtained from the frontend device and negotiated
> +with the vhost-pci driver via the control transmitq. The negotiation steps
> +are described in 1.5 Device Initialization.
> +
> +1.4 Device Configuration Layout
> +None currently defined
> +
> +1.5 Device Initialization
> +When a device VM boots, it creates a vhost-pci server socket.
> +
> +When a virtio device on the driver VM is created with specifying the use of
> +a vhost-pci device as a backend, a client socket is created and connected to
> +the server for message exchanges.
> +
> +The server and client communicate via socket messages. The server and the
> +vhost-pci driver communicate via controlq messages. The server updates the
> +driver via a control transmitq. The driver acknowledges the server via a
> +control receiveq.
> +
> +Both the socket message and controlq message headers can be constructed using
> +the following message info structure:
> +struct vhost_pci_msg_info {
> +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO 0
> +#define VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK 1
> +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO 2
> +#define VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK 3
> +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS 4
> +#define VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK 5
> +	u16 msg_type;
> +	u16 msg_version;
> +	u32 msg_len;
> +	u64 msg_seq;
> +};
> +The msg_seq field stores the message sequence number. Each client maintains
> +its own message sequence number.
> +
> +The socket messages are preceded by the following header:
> +struct vhost_pci_socket_hdr {
> +	struct vhost_pci_msg_info msg_info;
> +	u64 client_uuid;
> +};
> +The client_uuid field is generated by the client for the client identification
> +purpose.
> +
> +The controlq messages are preceded by the following header:
> +struct vhost_pci_controlq_hdr {
> +	struct vhost_pci_msg_info msg_info;
> +#define VHOST_PCI_FRONTEND_DEVICE_NET 1
> +#define VHOST_PCI_FRONTEND_DEVICE_BLK 2
> +#define VHOST_PCI_FRONTEND_DEVICE_CONSOLE 3
> +#define VHOST_PCI_FRONTEND_DEVICE_ENTROPY 4
> +#define VHOST_PCI_FRONTEND_DEVICE_BALLOON 5
> +#define VHOST_PCI_FRONTEND_DEVICE_SCSI 8
> +	u32 device_type;
> +	u64 device_id;
> +};
> +The device_type and device_id fields identify the frontend device (client).
> +
> +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO socket message can be
> +constructed using the following structure:
> +/* socket message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
> +struct vhost_pci_socket_memory_info {
> +#define VHOST_PCI_ADD_MEMORY 0
> +#define VHOST_PCI_DEL_MEMORY 1
> +	u16 ops;
> +	u32 nregions;
> +	struct vhost_pci_memory_region {
> +		int fd;
> +		u64 guest_phys_addr;
> +		u64 memory_size;
> +		u64 mmap_offset;
> +	} regions[VHOST_PCI_MAX_NREGIONS];
> +};
> +
> +The payload of a VHOST_PCI_MSG_TYPE_MEMORY_INFO controlq message can be
> +constructed using the following structure:
> +/* controlq message: VHOST_PCI_MSG_TYPE_MEMORY_INFO */
> +struct vhost_pci_controlq_memory_info {
> +#define VHOST_PCI_ADD_MEMORY 0
> +#define VHOST_PCI_DEL_MEMORY 1
> +	u16  ops;
> +	u32 nregion;
> +	struct exotic_memory_region {
> +		u64   region_base_xgpa;
> +		u64   size;
> +		u64   offset_in_bar_area;
> +	} region[VHOST_PCI_MAX_NREGIONS];
> +};
> +
> +The payload of VHOST_PCI_MSG_TYPE_DEVICE_INFO and
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS socket/controlq messages can be constructed
> +using the following vhost_pci_device_info structure and
> +the vhost_pci_feature_bits structure respectively.
> +
> +/* socket/controlq message: VHOST_PCI_DEVICE_INFO */
> +struct vhost_pci_device_info {
> +#define VHOST_PCI_ADD_FRONTEND_DEVICE 0
> +#define VHOST_PCI_DEL_FRONTEND_DEVICE 1
> +	u16    ops;
> +	u32    nvirtq;
> +	u32    device_type;
> +	u64    device_id;
> +	struct virtq exotic_virtq[VHOST_PCI_MAX_NVIRTQ];
> +};
> +
> +/* socket/controlq message: VHOST_PCI_MSG_TYPE_FEATURE_BITS */
> +struct vhost_pci_feature_bits {
> +	u64 feature_bits;
> +};
> +
> +The payload of all the ACK socket/controlq messages can be constructed using
> +the following structure:
> +/* socket/controlq message: ACK messages */
> +struct vhost_pci_ack {
> +	union ack_msg {
> +#define VHOST_PCI_ACK_ADD_DONE 0
> +#define VHOST_PCI_ACK_ADD_FAIL 1
> +#define VHOST_PCI_ACK_DEL_DONE 2
> +#define VHOST_PCI_ACK_DEL_FAIL 3
> +	u64 ack_memory_info;		
> +	u64 ack_device_info;
> +	u64 ack_feature_bits;
> +	};
> +};
> +
> +1.5.1 Device Requirements: Device Initialization
> +
> +1.5.1.1	The Frontend Device (Client)
> +The vhost-pci server socket path SHOULD be provided to a virtio client socket
> +for the connection.
> +
> +The client SHOULD send three socket messages,
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD),
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(FeatureBits)
> +and VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD),
> +to the server, and wait until receiving the corresponding three ACK
> +messages from the server.
> +
> +The client may receive the following ACK socket messages from the server:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the device
> +VM has successfully mapped the memory, and a vhost-pci device is created on
> +the device VM for the driver VM.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the device
> +VM fails to map the memory. Receiving this message results in the failure of
> +setting up the vhost-pci based inter-VM communication support for the driver
> +VM.
> +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the device
> +VM has successfully initialized the related interfaces to communicate to the
> +fronted device.
> +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the device
> +VM fails to  initialize the related interfaces to communicate to the fronted
> +device.
> +5. VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS): The payload of
> +this message contains the feature bits accepted by the vhost-pci device and
> +driver. If the accepted feature bits are not equal to the feature bits sent by
> +the client, the client MUST reset the device to go into backwards capability
> +mode, re-negotiate the received ACCEPTED_FEATURE_BITS with its driver, and
> +send back a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket
> +message to the vhost-pci server. Otherwise, no
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(ACCEPTED_FEATURE_BITS) socket message is
> +sent back to the server.
> +
> +1.5.1.2	The Vhost-pci Device (Server)
> +To let a VM be capable of creating vhost-pci devices, a vhost-pci server MUST
> +be created when it boots.
> +
> +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD)
> +socket message, it SHOULD check if a vhost-pci device has been created for the
> +requesting VM. If the client_uuid contained in the socket message is not new
> +to the server, the server SHOULD simply update the received message to the
> +vhost-pci driver via the control transmitq. Otherwise, the server SHOULD
> +create a new vhost-pci device, and continue the following memory mapping
> +related initialization steps.
> +
> +The vhost-pci server SHOULD add up all the memory region size, and use a
> +64-bit device bar for the mapping of all the memory regions obtained from the
> +socket message. To better support the driver VM to hot-plug memory, the bar
> +SHOULD be configured with a double size of the driver VM's memory. The server
> +SHOULD map the received memory info via the QEMU MemoryRegion mechanism, and
> +then the new created vhost-pci device SHOULD be hot-plugged to the VM.
> +
> +When the device status is updated with DRIVER_OK, a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) message SHOULD be put on the control
> +transmitq, and a controlq interrupt SHOULD be injected to the VM. The server
> +may receive the following ACK messages from the driver via the control
> +receiveq:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully added the memory info to its support. The server SHOULD send
> +a VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the client.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
> +fails to add the memory info to its support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the client.
> +
> +When the vhost-pci server receives a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) socket message, it SHOULD put a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS(feature bits) message on the control transmitq,
> +and inject a controlq interrupt to the VM. When the server receives a
> +VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature bits) controlq message
> +from the VM, it SHOULD send a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted
> +feature bits) socket message to the client. If the accepted feature bits sent
> +to the client does not equal to the one that it received, the server SHOULD
> +wait until receiving a VHOST_PCI_MSG_TYPE_FEATURE_BITS_ACK(accepted feature
> +bits) socket message from the client, which indicates that the frontend device
> +has finished the re-negotiation of the accepted feature bits.
> +
> +When the vhost-pci server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket
> +message, it SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) message on the
> +control transmitq, and inject a controlq interrupt to the VM. The server may
> +receive the following ACK messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the
> +vhost-pci driver has successfully added the frontend device to its support
> +list. The server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE)
> +socket message to the corresponding client.
> +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL): It indicates that the
> +vhost-pci driver fails to add the frontend device to its support list. The
> +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket
> +message to the corresponding client.
> +
> +1.5.2 Driver Requirements: Device Initialization
> +The vhost-pci driver SHOULD acknowledge the vhost-pci device via the control
> +receiveq if it succeeds to handle the received controlq message or not.
> +The vhost-pci driver MUST NOT accept any feature bits that are not offered
> +by the remote feature bits.
> +
> +When the vhost-pci driver receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD)
> +controlq message, it MUST initialize the corresponding driver interfaces of
> +the device type if they are not initialized, and add the device id to the
> +support list that records all the frontend devices being supported by
> +vhost-pci for inter-VM communications.
> +
> +1.6 Device Operation
> +1.6.1 Device Requirements: Device Operation
> +1.6.1.1 The Frontend Device (Client)
> +When the frontend device changes any info (e.g. device_id, virtq address)
> +that it has sent to the vhost-pci device, it MUST send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) socket message to the server. The
> +vhost-pci device SHOULD put a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) controlq
> +message to the control transmitq, and inject a controlq interrupt to the VM.
> +
> +When the frontend virtio device is removed (e.g. being hot-plugged out), the
> +client SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to the
> +server.
> +
> +Before the driver VM is destroyed or migrated, all the clients that connect to
> +the server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message to
> +the server. The destroying or migrating activity MUST wait until all the
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket messages are received.
> +
> +When the driver VM hot-adds or hot-removes memory, it SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message or
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message to the server.
> +
> +4.6.1.2 The Vhost-pci Device (Server)
> +When the server receives a VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) socket message, it SHOULD put a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD) or
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL) message to the control transmitq,
> +and inject a controlq interrupt to the VM. It may receive the following ACK
> +controlq messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully updated the device info. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_DONE) socket message to the
> +corresponding client.
> +2. VHOST_PCI_MSG_TYPE_DEVICE_INFO(ADD_FAIL): It indicates that the driver
> +fails to update the device info. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(ADD_FAIL) socket message to the
> +corresponding client.
> +3. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE): It indicates that the driver
> +has successfully removed the vhost-pci support for the frontend device. The
> +server SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_DONE) socket
> +message to the corresponding client.
> +4. VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL): It indicates that the driver
> +fails to remove the vhost-pci support for the frontend device. The server
> +SHOULD send a VHOST_PCI_MSG_TYPE_DEVICE_INFO_ACK(DEL_FAIL) socket message to
> +the corresponding client.
> +
> +When there is no client of a driver VM connecting to the vhost-pci device,
> +the server SHOULD destroy the vhost-pci device for that driver VM.
> +
> +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) socket message,
> +it calculates the total size of the received memory. If the new memory size
> +plus the mapped memory size is smaller than the address space size reserved by
> +the bar, the server SHOULD map the new memory and expose it to the VM via the
> +QEMU MemoryRegion mechanism. Then it SHOULD put a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(ADD) controlq message to the control transmitq,
> +and inject a controlq interrupt to the VM.
> +
> +If the new memory size plus the mapped memory size is larger than the address
> +space size reserved by the bar, the server SHOULD
> +1. clone out a new vhost-pci device;
> +2. configure the bar size to be double of the current memory size; and 
> +3. hot-plug out the old vhost-pci device, and hot-plug in the new vhost-pci
> +device to the VM.
> +
> +The initialization steps SHOULD follow 1.5 Device Initialization, except the
> +interaction messages between the server and client are not needed.
> +
> +The server may receive the following two memory info add related ACK controlq
> +messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE): It indicates that the driver
> +has successfully added the new memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_DONE) socket message to the corresponding
> +client.
> +2. VHOST_PCI_MSF_TYPE_MEMORY_INFO_ACK(ADD_FAIL): It indicates that the driver
> +fails to add the new memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(ADD_FAIL) socket message to the corresponding
> +client.
> +
> +When the server receives a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) socket message,
> +it SHOULD put a VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) controlq message to the
> +control transmitq, and inject a controlq interrupt to the VM. The server may
> +receive the following two memory ACK controlq messages from the driver:
> +1. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE): It indicates that the driver
> +has successfully deleted the memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_DONE)  socket message to the
> +corresponding client.
> +2. VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL): It indicates that the driver
> +fails to delete the memory info support. The server SHOULD send a
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO_ACK(DEL_FAIL) message to the corresponding
> +client.
> +
> +1.6.2 Driver Requirements: Device Operation
> +The vhost-pci driver SHOULD ensure that all the CPUs are noticed about the
> +VHOST_PCI_MSG_TYPE_MEMORY_INFO(DEL) and VHOST_PCI_MSG_TYPE_DEVICE_INFO(DEL)
> +controlq messages before acknowledging the server.
> -- 
> 1.8.3.1
> 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

  reply	other threads:[~2016-08-29 15:27 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-06-19 14:14 [Qemu-devel] [PATCH] *** Vhost-pci RFC v2 *** Wei Wang
2016-06-19 14:14 ` [Qemu-devel] [PATCH] Vhost-pci RFC v2: a new virtio device for inter-VM communication Wei Wang
2016-08-29 15:27   ` Stefan Hajnoczi [this message]
2016-06-27  2:01 ` [Qemu-devel] [virtio-comment] [PATCH] *** Vhost-pci RFC v2 *** Wang, Wei W
2016-08-29 15:24   ` Stefan Hajnoczi
2016-08-29 15:42     ` Michael S. Tsirkin
2016-08-30 10:08     ` Wang, Wei W
2016-08-30 11:10       ` Michael S. Tsirkin
2016-08-30 12:59         ` Wang, Wei W
2016-08-31 16:07       ` Stefan Hajnoczi
2016-09-01 16:27         ` Wei Wang
2016-09-02 13:26           ` Stefan Hajnoczi
2016-09-03 13:36             ` Wang, Wei W
2016-09-05  8:56               ` Marc-André Lureau
2016-09-06 17:16                 ` Stefan Hajnoczi
2016-09-07 12:27                   ` Wang, Wei W
2016-08-29 15:41   ` Michael S. Tsirkin
2016-08-30 10:07     ` Wang, Wei W
2016-08-31 12:30 ` [Qemu-devel] " Marc-André Lureau
2016-09-01 16:26   ` Wei Wang
2016-09-01  8:49     ` Marc-André Lureau
2016-09-01 12:13       ` [Qemu-devel] [virtio-comment] " Wei Wang
2016-09-01 13:05         ` Marc-André Lureau
2016-09-02  1:29           ` Wei Wang
2016-09-02  8:15             ` Marc-André Lureau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160829152707.GC18909@stefanha-x1.localdomain \
    --to=stefanha@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=marcandre.lureau@redhat.com \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=virtio-comment@lists.oasis-open.org \
    --cc=wei.w.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).