From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36268) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3MXJ-0003rr-Uu for qemu-devel@nongnu.org; Wed, 15 Jan 2014 04:13:26 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3MXE-0004n6-Vq for qemu-devel@nongnu.org; Wed, 15 Jan 2014 04:13:21 -0500 Received: from mx1.redhat.com ([209.132.183.28]:14661) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3MXE-0004mz-KI for qemu-devel@nongnu.org; Wed, 15 Jan 2014 04:13:16 -0500 Date: Wed, 15 Jan 2014 11:13:09 +0200 From: "Michael S. Tsirkin" Message-ID: <20140115091309.GC1719@redhat.com> References: <1389623119-15863-1-git-send-email-a.motakis@virtualopensystems.com> <1389623119-15863-6-git-send-email-a.motakis@virtualopensystems.com> <20140114111035.GA27922@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v6 5/8] Add domain socket communication for vhost-user backend List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Antonios Motakis Cc: Luke Gorrie , snabb-devel@googlegroups.com, Nikolay Nikolaev , qemu-devel qemu-devel , VirtualOpenSystems Technical Team On Tue, Jan 14, 2014 at 07:14:38PM +0100, Antonios Motakis wrote: >=20 >=20 >=20 > On Tue, Jan 14, 2014 at 12:10 PM, Michael S. Tsirkin w= rote: >=20 > On Mon, Jan 13, 2014 at 03:25:16PM +0100, Antonios Motakis wrote: > > Add structures for passing vhost-user messages over a unix domain= socket. > > This is the equivalent of the existing vhost-kernel ioctls. > > > > Connect to the named unix domain socket. The system call sendmsg > > is used for communication. To be able to pass file descriptors > > between processes - we use SCM_RIGHTS type in the message control= header. > > > > Signed-off-by: Antonios Motakis > > Signed-off-by: Nikolay Nikolaev >=20 > Not all comments in v5 review of this file have been > addressed. > I think if a comment was wrong it's a good idea to > add clarification in code on why it is, so future readers > aren't confused. >=20 > In particular I think always explicitly opening > domain sockets and forcing a specific client/server > model is a problem. >=20 >=20 >=20 > I think we need to clarify how the model would work when using QEMU as = the > server. As a client it is clear, because it parallels what vhost does w= hen used > with the kernel, however as a server it is not completely clear what th= e use > case is and how it would work. > =A0 You prefer connecting from qemu to vhost but the reverse mode seems just as valid, it's an arbitrary decision. If you don't specify nowait, server can simply wait for connection. So pretty simple really, not much will change. If you allow nowait, it's harder because it's not clear what to do if there's a command that needs a response. Maybe stop VCPU at that point, but that's not very easy. For now I'm ok with not supporting nowait. > I think we also want the protocol documented in docs/ > so future users don't have to read the code to connect > to qemu. >=20 > Can you add a unit test for this code? >=20 > We ship pxe ROM that enables virtio net, > so you can write a small stub that works as the > remote side of this protocol, poke at > guest memory to see that it has > the expected content (compare to what we get with > readl). >=20 > In theory you could also get and put some buffers > though that's not a must. >=20 >=20 > > --- > > =A0hw/virtio/vhost-backend.c | 306 > +++++++++++++++++++++++++++++++++++++++++++++- > > =A01 file changed, 301 insertions(+), 5 deletions(-) > > > > diff --git a/hw/virtio/vhost-backend.c b/hw/virtio/vhost-backend.= c > > index 1d83e1d..460ee02 100644 > > --- a/hw/virtio/vhost-backend.c > > +++ b/hw/virtio/vhost-backend.c > > @@ -11,34 +11,330 @@ > > =A0#include "hw/virtio/vhost.h" > > =A0#include "hw/virtio/vhost-backend.h" > > =A0#include "qemu/error-report.h" > > +#include "qemu/sockets.h" > > > > =A0#include > > =A0#include > > =A0#include > > +#include > > +#include > > +#include > > + > > +#define VHOST_MEMORY_MAX_NREGIONS =A0 =A08 > > +#define VHOST_USER_SOCKTO =A0 =A0 =A0 =A0 =A0 =A0(300) /* msec *= / > > + > > +typedef enum VhostUserRequest { > > + =A0 =A0VHOST_USER_NONE =3D 0, > > + =A0 =A0VHOST_USER_GET_FEATURES =3D 1, > > + =A0 =A0VHOST_USER_SET_FEATURES =3D 2, > > + =A0 =A0VHOST_USER_SET_OWNER =3D 3, > > + =A0 =A0VHOST_USER_RESET_OWNER =3D 4, > > + =A0 =A0VHOST_USER_SET_MEM_TABLE =3D 5, > > + =A0 =A0VHOST_USER_SET_LOG_BASE =3D 6, > > + =A0 =A0VHOST_USER_SET_LOG_FD =3D 7, > > + =A0 =A0VHOST_USER_SET_VRING_NUM =3D 8, > > + =A0 =A0VHOST_USER_SET_VRING_ADDR =3D 9, > > + =A0 =A0VHOST_USER_SET_VRING_BASE =3D 10, > > + =A0 =A0VHOST_USER_GET_VRING_BASE =3D 11, > > + =A0 =A0VHOST_USER_SET_VRING_KICK =3D 12, > > + =A0 =A0VHOST_USER_SET_VRING_CALL =3D 13, > > + =A0 =A0VHOST_USER_SET_VRING_ERR =3D 14, > > + =A0 =A0VHOST_USER_NET_SET_BACKEND =3D 15, > > + =A0 =A0VHOST_USER_ECHO =3D 16, > > + =A0 =A0VHOST_USER_MAX > > +} VhostUserRequest; > > + > > +typedef struct VhostUserMemoryRegion { > > + =A0 =A0uint64_t guest_phys_addr; > > + =A0 =A0uint64_t memory_size; > > + =A0 =A0uint64_t userspace_addr; > > +} VhostUserMemoryRegion; > > + > > +typedef struct VhostUserMemory { > > + =A0 =A0uint32_t nregions; > > + =A0 =A0uint32_t padding; > > + =A0 =A0VhostUserMemoryRegion regions[VHOST_MEMORY_MAX_NREGIONS]= ; > > +} VhostUserMemory; > > + > > +typedef struct VhostUserMsg { > > + =A0 =A0VhostUserRequest request; > > + > > +#define VHOST_USER_VERSION_MASK =A0 =A0 (0x3) > > +#define VHOST_USER_REPLY_MASK =A0 =A0 =A0 (0x1<<2) > > + =A0 =A0uint32_t flags; > > + =A0 =A0uint32_t size; /* the following payload size */ > > + =A0 =A0union { > > + =A0 =A0 =A0 =A0uint64_t u64; > > + =A0 =A0 =A0 =A0struct vhost_vring_state state; > > + =A0 =A0 =A0 =A0struct vhost_vring_addr addr; > > + =A0 =A0 =A0 =A0VhostUserMemory memory; > > + =A0 =A0}; > > +} QEMU_PACKED VhostUserMsg; > > + > > +static VhostUserMsg m __attribute__ ((unused)); > > +#define VHOST_USER_HDR_SIZE (sizeof(m.request) \ > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0+ sizeof= (m.flags) \ > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0+ sizeof= (m.size)) > > + > > +#define VHOST_USER_PAYLOAD_SIZE (sizeof(m) - VHOST_USER_HDR_SIZE= ) > > + > > +/* The version of the protocol we support */ > > +#define VHOST_USER_VERSION =A0 =A0(0x1) > > + > > +static int vhost_user_recv(int fd, VhostUserMsg *msg) > > +{ > > + =A0 =A0ssize_t r; > > + =A0 =A0uint8_t *p =3D (uint8_t *) msg; > > + > > + =A0 =A0/* read the header */ > > + =A0 =A0do { > > + =A0 =A0 =A0 =A0r =3D read(fd, p, VHOST_USER_HDR_SIZE); > > + =A0 =A0} while (r < 0 && errno =3D=3D EINTR); > > + > > + =A0 =A0if (r < 0) { > > + =A0 =A0 =A0 =A0error_report("Failed to read msg header, reason:= %s\n", > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 strerror(errno)); > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0if (r !=3D VHOST_USER_HDR_SIZE) { > > + =A0 =A0 =A0 =A0error_report("Failed to read msg header. Read %z= u instead of > %zu.\n", > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 r, VHOST_USER_HDR_SIZE)= ; > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0/* validate received flags */ > > + =A0 =A0if (msg->flags !=3D (VHOST_USER_REPLY_MASK | VHOST_USER_= VERSION)) { > > + =A0 =A0 =A0 =A0error_report("Failed to read msg header." > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 " Flags 0x%x instead of= 0x%x.\n", > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 msg->flags, VHOST_USER_= REPLY_MASK | > VHOST_USER_VERSION); > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0/* validate message size is sane */ > > + =A0 =A0if (msg->size > VHOST_USER_PAYLOAD_SIZE) { > > + =A0 =A0 =A0 =A0error_report("Failed to read msg header." > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 " Size %d exceeds the m= aximum %zu.\n", > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 msg->size, VHOST_USER_P= AYLOAD_SIZE); > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0p +=3D VHOST_USER_HDR_SIZE; > > + > > + =A0 =A0/* read the payload */ > > + =A0 =A0do { > > + =A0 =A0 =A0 =A0r =3D read(fd, p, msg->size); > > + =A0 =A0} while (r < 0 && errno =3D=3D EINTR); > > + > > + =A0 =A0if (r < 0) { > > + =A0 =A0 =A0 =A0error_report("Failed to read msg payload, reason= : %s\n", > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 strerror(errno)); > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0if (r !=3D msg->size) { > > + =A0 =A0 =A0 =A0error_report("Failed to read msg payload. Read %= zu instead of > %d.\n", > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 r, msg->size); > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0return 0; > > + > > +fail: > > + =A0 =A0return -1; > > +} > > + > > +static int vhost_user_send_fds(int fd, VhostUserMsg *msg, int *f= ds, > > + =A0 =A0 =A0 =A0size_t fd_num) > > +{ > > + =A0 =A0int r; > > + > > + =A0 =A0struct msghdr msgh; > > + =A0 =A0struct iovec iov; > > + > > + =A0 =A0size_t fd_size =3D fd_num * sizeof(int); > > + =A0 =A0char control[CMSG_SPACE(fd_size)]; > > + =A0 =A0struct cmsghdr *cmsg; > > + > > + =A0 =A0memset(&msgh, 0, sizeof(msgh)); > > + =A0 =A0memset(control, 0, sizeof(control)); > > + > > + =A0 =A0/* set the payload */ > > + =A0 =A0iov.iov_base =3D msg; > > + =A0 =A0iov.iov_len =3D VHOST_USER_HDR_SIZE + msg->size; > > + > > + =A0 =A0msgh.msg_iov =3D &iov; > > + =A0 =A0msgh.msg_iovlen =3D 1; > > + > > + =A0 =A0if (fd_num) { > > + =A0 =A0 =A0 =A0msgh.msg_control =3D control; > > + =A0 =A0 =A0 =A0msgh.msg_controllen =3D sizeof(control); > > + > > + =A0 =A0 =A0 =A0cmsg =3D CMSG_FIRSTHDR(&msgh); > > + > > + =A0 =A0 =A0 =A0cmsg->cmsg_len =3D CMSG_LEN(fd_size); > > + =A0 =A0 =A0 =A0cmsg->cmsg_level =3D SOL_SOCKET; > > + =A0 =A0 =A0 =A0cmsg->cmsg_type =3D SCM_RIGHTS; > > + =A0 =A0 =A0 =A0memcpy(CMSG_DATA(cmsg), fds, fd_size); > > + =A0 =A0} else { > > + =A0 =A0 =A0 =A0msgh.msg_control =3D 0; > > + =A0 =A0 =A0 =A0msgh.msg_controllen =3D 0; > > + =A0 =A0} > > + > > + =A0 =A0do { > > + =A0 =A0 =A0 =A0r =3D sendmsg(fd, &msgh, 0); > > + =A0 =A0} while (r < 0 && errno =3D=3D EINTR); > > + > > + =A0 =A0if (r < 0) { > > + =A0 =A0 =A0 =A0error_report("Failed to send msg(%d), reason: %s= \n", > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0msg->request, strerror(errno)); > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0if (r !=3D VHOST_USER_HDR_SIZE + msg->size) { > > + =A0 =A0 =A0 =A0error_report("Failed to send msg(%d). Sent %d in= stead of %zu.\ > n", > > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0msg->request, r, VHOST_USER_HDR_= SIZE + msg->size); > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0return 0; > > + > > +fail: > > + =A0 =A0return -1; > > +} > > + > > +static int vhost_user_echo(struct vhost_dev *dev) > > +{ > > + =A0 =A0VhostUserMsg msg; > > + =A0 =A0int fd =3D dev->control; > > + > > + =A0 =A0if (fd < 0) { > > + =A0 =A0 =A0 =A0return 0; > > + =A0 =A0} > > + > > + =A0 =A0/* check connection */ > > + =A0 =A0msg.request =3D VHOST_USER_ECHO; > > + =A0 =A0msg.flags =3D VHOST_USER_VERSION; > > + =A0 =A0msg.size =3D 0; > > + > > + =A0 =A0if (vhost_user_send_fds(fd, &msg, 0, 0) < 0) { > > + =A0 =A0 =A0 =A0error_report("ECHO failed\n"); > > + =A0 =A0 =A0 =A0return -1; > > + =A0 =A0} > > + > > + =A0 =A0if (vhost_user_recv(fd, &msg) < 0) { > > + =A0 =A0 =A0 =A0error_report("ECHO failed\n"); > > + =A0 =A0 =A0 =A0return -1; > > + =A0 =A0} > > + > > + =A0 =A0return 0; > > +} > > > > =A0static int vhost_user_call(struct vhost_dev *dev, unsigned lon= g int > request, > > =A0 =A0 =A0 =A0 =A0void *arg) > > =A0{ > > + =A0 =A0int fd =3D dev->control; > > + =A0 =A0VhostUserMsg msg; > > + =A0 =A0int result =3D 0; > > + =A0 =A0int fds[VHOST_MEMORY_MAX_NREGIONS]; > > + =A0 =A0size_t fd_num =3D 0; > > + > > =A0 =A0 =A0assert(dev->vhost_ops->backend_type =3D=3D VHOST_BACKE= ND_TYPE_USER); > > - =A0 =A0error_report("vhost_user_call not implemented\n"); > > > > - =A0 =A0return -1; > > + =A0 =A0if (fd < 0) { > > + =A0 =A0 =A0 =A0return 0; > > + =A0 =A0} > > + > > + =A0 =A0msg.request =3D VHOST_USER_NONE; > > + =A0 =A0msg.flags =3D VHOST_USER_VERSION; > > + =A0 =A0msg.size =3D 0; > > + > > + =A0 =A0switch (request) { > > + =A0 =A0default: > > + =A0 =A0 =A0 =A0error_report("vhost-user trying to send unhandle= d ioctl\n"); > > + =A0 =A0 =A0 =A0return -1; > > + =A0 =A0 =A0 =A0break; > > + =A0 =A0} > > + > > + =A0 =A0result =3D vhost_user_send_fds(fd, &msg, fds, fd_num); > > + > > + =A0 =A0return result; > > =A0} > > > > =A0static int vhost_user_init(struct vhost_dev *dev, const char *= devpath) > > =A0{ > > + =A0 =A0int fd =3D -1; > > + =A0 =A0struct sockaddr_un un; > > + =A0 =A0struct timeval tv; > > + =A0 =A0size_t len; > > + > > =A0 =A0 =A0assert(dev->vhost_ops->backend_type =3D=3D VHOST_BACKE= ND_TYPE_USER); > > - =A0 =A0error_report("vhost_user_init not implemented\n"); > > > > + =A0 =A0/* Create the socket */ > > + =A0 =A0fd =3D socket(AF_UNIX, SOCK_STREAM, 0); > > + =A0 =A0if (fd =3D=3D -1) { > > + =A0 =A0 =A0 =A0error_report("socket: %s", strerror(errno)); > > + =A0 =A0 =A0 =A0return -1; > > + =A0 =A0} > > + > > + =A0 =A0/* Set socket options */ > > + =A0 =A0tv.tv_sec =3D VHOST_USER_SOCKTO / 1000; > > + =A0 =A0tv.tv_usec =3D (VHOST_USER_SOCKTO % 1000) * 1000; > > + > > + =A0 =A0if (setsockopt(fd, SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(= struct > timeval)) > > + =A0 =A0 =A0 =A0 =A0 =A0=3D=3D -1) { > > + =A0 =A0 =A0 =A0error_report("setsockopt SO_SNDTIMEO: %s", strer= ror(errno)); > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0if (setsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(= struct > timeval)) > > + =A0 =A0 =A0 =A0 =A0 =A0=3D=3D -1) { > > + =A0 =A0 =A0 =A0error_report("setsockopt SO_RCVTIMEO: %s", strer= ror(errno)); > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0/* Connect */ > > + =A0 =A0un.sun_family =3D AF_UNIX; > > + =A0 =A0strcpy(un.sun_path, devpath); > > + > > + =A0 =A0len =3D sizeof(un.sun_family) + strlen(devpath); > > + > > + =A0 =A0if (connect(fd, (struct sockaddr *) &un, len) =3D=3D -1)= { > > + =A0 =A0 =A0 =A0error_report("connect: %s", strerror(errno)); > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0/* Cleanup if there is previous connection left */ > > + =A0 =A0if (dev->control >=3D 0) { > > + =A0 =A0 =A0 =A0dev->vhost_ops->vhost_backend_cleanup(dev); > > + =A0 =A0} > > + =A0 =A0dev->control =3D fd; > > + > > + =A0 =A0if (vhost_user_echo(dev) < 0) { > > + =A0 =A0 =A0 =A0dev->control =3D -1; > > + =A0 =A0 =A0 =A0goto fail; > > + =A0 =A0} > > + > > + =A0 =A0return fd; > > + > > +fail: > > + =A0 =A0close(fd); > > =A0 =A0 =A0return -1; > > + > > =A0} > > > > =A0static int vhost_user_cleanup(struct vhost_dev *dev) > > =A0{ > > + =A0 =A0int r =3D 0; > > + > > =A0 =A0 =A0assert(dev->vhost_ops->backend_type =3D=3D VHOST_BACKE= ND_TYPE_USER); > > - =A0 =A0error_report("vhost_user_cleanup not implemented\n"); > > > > - =A0 =A0return -1; > > + =A0 =A0if (dev->control >=3D 0) { > > + =A0 =A0 =A0 =A0r =3D close(dev->control); > > + =A0 =A0} > > + =A0 =A0dev->control =3D -1; > > + > > + =A0 =A0return r; > > =A0} > > > > =A0static const VhostOps user_ops =3D { > > -- > > 1.8.3.2 > > >=20 >=20