From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46139) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3Rn6-0000lh-VS for qemu-devel@nongnu.org; Wed, 15 Jan 2014 09:50:06 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1W3Rn2-0003J1-4y for qemu-devel@nongnu.org; Wed, 15 Jan 2014 09:50:00 -0500 Received: from mx1.redhat.com ([209.132.183.28]:4833) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1W3Rn1-0003Ix-QT for qemu-devel@nongnu.org; Wed, 15 Jan 2014 09:49:56 -0500 Date: Wed, 15 Jan 2014 16:49:49 +0200 From: "Michael S. Tsirkin" Message-ID: <20140115144949.GD2167@redhat.com> References: <1389623119-15863-1-git-send-email-a.motakis@virtualopensystems.com> <20140114113327.GF27922@redhat.com> <20140115090744.GB1719@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH v6 0/8] Vhost and vhost-net support for userspace based backends List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Antonios Motakis Cc: Luke Gorrie , snabb-devel@googlegroups.com, Nikolay Nikolaev , qemu-devel qemu-devel , VirtualOpenSystems Technical Team On Wed, Jan 15, 2014 at 01:50:47PM +0100, Antonios Motakis wrote: >=20 >=20 >=20 > On Wed, Jan 15, 2014 at 10:07 AM, Michael S. Tsirkin w= rote: >=20 > On Tue, Jan 14, 2014 at 07:13:43PM +0100, Antonios Motakis wrote: > > > > > > > > On Tue, Jan 14, 2014 at 12:33 PM, Michael S. Tsirkin > wrote: > > > > =A0 =A0 On Mon, Jan 13, 2014 at 03:25:11PM +0100, Antonios Motaki= s wrote: > > =A0 =A0 > In this patch series we would like to introduce our app= roach for > putting > > =A0 =A0 a > > =A0 =A0 > virtio-net backend in an external userspace process. Ou= r eventual > target > > =A0 =A0 is to > > =A0 =A0 > run the network backend in the Snabbswitch ethernet swi= tch, while > > =A0 =A0 receiving > > =A0 =A0 > traffic from a guest inside QEMU/KVM which runs an unmo= dified > virtio-net > > =A0 =A0 > implementation. > > =A0 =A0 > > > =A0 =A0 > For this, we are working into extending vhost to allow = equivalent > > =A0 =A0 functionality > > =A0 =A0 > for userspace. Vhost already passes control of the data= plane of > > =A0 =A0 virtio-net to > > =A0 =A0 > the host kernel; we want to realize a similar model, bu= t for > userspace. > > =A0 =A0 > > > =A0 =A0 > In this patch series the concept of a vhost-backend is = introduced. > > =A0 =A0 > > > =A0 =A0 > We define two vhost backend types - vhost-kernel and vh= ost-user. > The > > =A0 =A0 former is > > =A0 =A0 > the interface to the current kernel module implementati= on. Its > control > > =A0 =A0 plane is > > =A0 =A0 > ioctl based. The data plane is the kernel directly acce= ssing the > QEMU > > =A0 =A0 allocated, > > =A0 =A0 > guest memory. > > =A0 =A0 > > > =A0 =A0 > In the new vhost-user backend, the control plane is bas= ed on > > =A0 =A0 communication > > =A0 =A0 > between QEMU and another userspace process using a unix= domain > socket. > > =A0 =A0 This > > =A0 =A0 > allows to implement a virtio backend for a guest runnin= g in QEMU, > inside > > =A0 =A0 the > > =A0 =A0 > other userspace process. > > =A0 =A0 > > > =A0 =A0 > We change -mem-path to QemuOpts and add prealloc, share= and unlink > as > > =A0 =A0 properties > > =A0 =A0 > to it. HugeTLBFS requirements of -mem-path are relaxed,= so any > valid path > > =A0 =A0 can > > =A0 =A0 > be used now. The new properties allow more fine grained= control > over the > > =A0 =A0 guest > > =A0 =A0 > RAM backing store. > > =A0 =A0 > > > =A0 =A0 > The data path is realized by directly accessing the vri= ngs and the > buffer > > =A0 =A0 data > > =A0 =A0 > off the guest's memory. > > =A0 =A0 > > > =A0 =A0 > The current user of vhost-user is only vhost-net. We ad= d new netdev > > =A0 =A0 backend > > =A0 =A0 > that is intended to initialize vhost-net with vhost-use= r backend. > > > > =A0 =A0 Some meta comments. > > > > =A0 =A0 Something that makes this patch harder to review is how i= t's > > =A0 =A0 split up. Generally IMHO it's not a good idea to repeated= ly > > =A0 =A0 edit same part of file adding stuff in patch after patch, > > =A0 =A0 it's only making things harder to read if you add stubs, = then fill > them up. > > =A0 =A0 (we do this sometimes when we are changing existing code,= but > > =A0 =A0 it is generally not needed when adding new code) > > > > =A0 =A0 Instead, split it like this: > > > > =A0 =A0 1. general refactoring, split out linux specific and gene= ric parts > > =A0 =A0 =A0 =A0and add the ops indirection > > =A0 =A0 2. add new files for vhost-user with complete implementat= ion. > > =A0 =A0 =A0 =A0without command line to support it, there will be = no way to use > it, > > =A0 =A0 =A0 =A0but should build fine. > > =A0 =A0 3. tie it all up with option parsing > > > > > > =A0 =A0 Generic vhost and vhost net files should be kept separate. > > =A0 =A0 Don't let vhost net stuff seep back into generic files, > > =A0 =A0 we have vhost-scsi too. > > =A0 =A0 I would also prefer that userspace vhost has its own file= s. > > > > > > Ok, we'll keep this into account. > > =A0 > > > > > > =A0 =A0 We need a small test server qemu can talk to, to verify t= hings > > =A0 =A0 actually work. > > > > > > We have implemented such a test app: https://github.com/ > virtualopensystems/vapp > > > > We use it for testing, and also as a reference implementation. A = client > is also > > included. > > >=20 > Sounds good. Can we include this in qemu and tie > it into the qtest framework? > >From a brief look, it merely needs to be tweaked for portability, > unless >=20 > > > > =A0 =A0 Already commented on: reuse the chardev syntax and prefer= ably code. > > =A0 =A0 We already support a bunch of options there for > > =A0 =A0 domain sockets that will be useful here, they should > > =A0 =A0 work here as well. > > > > > > We adapted the syntax for this to be consistent with chardev. Wha= t we > didn't > > use, it is not obvious at all to us on how they should be used; a= lot of > the > > chardev options just don't apply to us. > > =A0 >=20 > Well server option should work at least. > nowait can work too? >=20 > Also, if reconnect is useful it should be for chardevs too, so if w= e don't > share code, need to code it in two places to stay consistent. >=20 > Overall sharing some code might be better ... >=20 >=20 >=20 > What you have in mind is to use the functions chardev uses from qemu-so= ckets.c > right? Chardev itself doesn't look to have anything else that can be sh= ared. Yes. > The problem with reconnect is that it is implemented at the protocol le= vel; we > are not just transparently reconnecting the socket. So the same approac= h would > most likely not apply for chardev. Chardev mostly just could use transparent reconnect. vhost-user could use that and get a callback to reconfigure everything after reconnect. Once you write up the protocol in some text file we can discuss this in more detail. For example I wonder how would feature negotiation work with reconnect: new connection could be from another application that does not support same features, but virtio assumes that device features never change. >=20 >=20 >=20 > > =A0 =A0 In particular you shouldn't require filesystem access by = qemu, > > =A0 =A0 passing fd for domain socket should work. > > > > > > We can add an option to pass an fd for the domain socket if neede= d. > However as > > far as we understand, chardev doesn't do that either (at least fo= rm > looking at > > the man page). Maybe we misunderstand what you mean. >=20 > Sorry. I got confused with e.g. tap which has this. This might be > useful but does not have to block this patch. >=20 > > > > > > =A0 =A0 > Example usage: > > =A0 =A0 > > > =A0 =A0 > qemu -m 1024 -mem-path /hugetlbfs,prealloc=3Don,share=3D= on \ > > =A0 =A0 > =A0 =A0 =A0-netdev type=3Dvhost-user,id=3Dnet0,path=3D/= path/to/sock,poll_time=3D > 2500 \ > > =A0 =A0 > =A0 =A0 =A0-device virtio-net-pci,netdev=3Dnet0 > > > > =A0 =A0 It's not clear which parts of -mem-path are required for = vhost-user. > > =A0 =A0 It should be documented somewhere, made clear in -help > > =A0 =A0 and should fail gracefully if misconfigured. > > > > > > > > Ok. > > =A0 > > > > > > =A0 =A0 > > > =A0 =A0 > Changes from v5: > > =A0 =A0 > =A0- Split -mem-path unlink option to a separate patch > > =A0 =A0 > =A0- Fds are passed only in the ancillary data > > =A0 =A0 > =A0- Stricter message size checks on receive/send > > =A0 =A0 > =A0- Netdev vhost-user now includes path and poll_time = options > > =A0 =A0 > =A0- The connection probing interval is configurable > > =A0 =A0 > > > =A0 =A0 > Changes from v4: > > =A0 =A0 > =A0- Use error_report for errors > > =A0 =A0 > =A0- VhostUserMsg has new field `size` indicating the f= ollowing > payload > > =A0 =A0 length. > > =A0 =A0 > =A0 =A0Field `flags` now has version and reply bits. Th= e structure is > packed. > > =A0 =A0 > =A0- Send data is of variable length (`size` field in m= essage) > > =A0 =A0 > =A0- Receive in 2 steps, header and payload > > =A0 =A0 > =A0- Add new message type VHOST_USER_ECHO, to check con= nection status > > =A0 =A0 > > > =A0 =A0 > Changes from v3: > > =A0 =A0 > =A0- Convert -mem-path to QemuOpts with prealloc, share= and unlink > > =A0 =A0 properties > > =A0 =A0 > =A0- Set 1 sec timeout when read/write to the unix doma= in socket > > =A0 =A0 > =A0- Fix file descriptor leak > > =A0 =A0 > > > =A0 =A0 > Changes from v2: > > =A0 =A0 > =A0- Reconnect when the backend disappears > > =A0 =A0 > > > =A0 =A0 > Changes from v1: > > =A0 =A0 > =A0- Implementation of vhost-user netdev backend > > =A0 =A0 > =A0- Code improvements > > =A0 =A0 > > > =A0 =A0 > Antonios Motakis (8): > > =A0 =A0 > =A0 Convert -mem-path to QemuOpts and add prealloc and = share > properties > > =A0 =A0 > =A0 New -mem-path option - unlink. > > =A0 =A0 > =A0 Decouple vhost from kernel interface > > =A0 =A0 > =A0 Add vhost-user skeleton > > =A0 =A0 > =A0 Add domain socket communication for vhost-user back= end > > =A0 =A0 > =A0 Add vhost-user calls implementation > > =A0 =A0 > =A0 Add new vhost-user netdev backend > > =A0 =A0 > =A0 Add vhost-user reconnection > > =A0 =A0 > > > =A0 =A0 > =A0exec.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0| =A057 +++- > > =A0 =A0 > =A0hmp-commands.hx =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = | =A0 4 +- > > =A0 =A0 > =A0hw/net/vhost_net.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0| = 144 +++++++--- > > =A0 =A0 > =A0hw/net/virtio-net.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 | =A0= 42 ++- > > =A0 =A0 > =A0hw/scsi/vhost-scsi.c =A0 =A0 =A0 =A0 =A0 =A0 =A0| =A0= 13 +- > > =A0 =A0 > =A0hw/virtio/Makefile.objs =A0 =A0 =A0 =A0 =A0 | =A0 2 = +- > > =A0 =A0 > =A0hw/virtio/vhost-backend.c =A0 =A0 =A0 =A0 | 556 > > =A0 =A0 ++++++++++++++++++++++++++++++++++++++ > > =A0 =A0 > =A0hw/virtio/vhost.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 | = =A046 ++-- > > =A0 =A0 > =A0include/exec/cpu-all.h =A0 =A0 =A0 =A0 =A0 =A0| =A0 = 3 - > > =A0 =A0 > =A0include/hw/virtio/vhost-backend.h | =A040 +++ > > =A0 =A0 > =A0include/hw/virtio/vhost.h =A0 =A0 =A0 =A0 | =A0 4 +- > > =A0 =A0 > =A0include/net/vhost-user.h =A0 =A0 =A0 =A0 =A0| =A017 = ++ > > =A0 =A0 > =A0include/net/vhost_net.h =A0 =A0 =A0 =A0 =A0 | =A015 = +- > > =A0 =A0 > =A0net/Makefile.objs =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 | = =A0 2 +- > > =A0 =A0 > =A0net/clients.h =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= | =A0 3 + > > =A0 =A0 > =A0net/hub.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 | =A0 1 + > > =A0 =A0 > =A0net/net.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 | =A0 2 + > > =A0 =A0 > =A0net/tap.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 | =A016 +- > > =A0 =A0 > =A0net/vhost-user.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= | 177 ++++++++++++ > > =A0 =A0 > =A0qapi-schema.json =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= | =A021 +- > > =A0 =A0 > =A0qemu-options.hx =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = | =A024 +- > > =A0 =A0 > =A0vl.c =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0| =A041 ++- > > =A0 =A0 > =A022 files changed, 1106 insertions(+), 124 deletions(= -) > > =A0 =A0 > =A0create mode 100644 hw/virtio/vhost-backend.c > > =A0 =A0 > =A0create mode 100644 include/hw/virtio/vhost-backend.h > > =A0 =A0 > =A0create mode 100644 include/net/vhost-user.h > > =A0 =A0 > =A0create mode 100644 net/vhost-user.c > > =A0 =A0 > > > =A0 =A0 > -- > > =A0 =A0 > 1.8.3.2 > > =A0 =A0 > > > > > >=20 >=20