From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([208.118.235.92]:46351) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uhb8c-0008SE-PG for qemu-devel@nongnu.org; Wed, 29 May 2013 03:49:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1Uhb8X-0001Uh-Ki for qemu-devel@nongnu.org; Wed, 29 May 2013 03:49:38 -0400 Received: from mail-ea0-x236.google.com ([2a00:1450:4013:c01::236]:61730) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1Uhb8X-0001UX-B6 for qemu-devel@nongnu.org; Wed, 29 May 2013 03:49:33 -0400 Received: by mail-ea0-f182.google.com with SMTP id r16so4953804ead.27 for ; Wed, 29 May 2013 00:49:32 -0700 (PDT) Date: Wed, 29 May 2013 09:49:29 +0200 From: Stefan Hajnoczi Message-ID: <20130529074929.GC20199@stefanha-thinkpad.redhat.com> References: <20130527093409.GH21969@stefanha-thinkpad.redhat.com> <51A496C4.1020602@os.inf.tu-dresden.de> <87r4grca4p.fsf@codemonkey.ws> <20130528171742.GB30296@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20130528171742.GB30296@redhat.com> Subject: Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: "Michael S. Tsirkin" Cc: "snabb-devel@googlegroups.com" , qemu-devel@nongnu.org, Anthony Liguori , Julian Stecklina On Tue, May 28, 2013 at 08:17:42PM +0300, Michael S. Tsirkin wrote: > On Tue, May 28, 2013 at 12:00:38PM -0500, Anthony Liguori wrote: > > Julian Stecklina writes: > > > > > On 05/28/2013 12:10 PM, Luke Gorrie wrote: > > >> On 27 May 2013 11:34, Stefan Hajnoczi > >> > wrote: > > >> > > >> vhost_net is about connecting the a virtio-net speaking process to a > > >> tun-like device. The problem you are trying to solve is connecting a > > >> virtio-net speaking process to Snabb Switch. > > >> > > >> > > >> Yep! > > > > > > Since I am on a similar path as Luke, let me share another idea. > > > > > > What about extending qemu in a way to allow PCI device models to be > > > implemented in another process. > > > > We aren't going to support any interface that enables out of tree > > devices. This is just plugins in a different form with even more > > downsides. You cannot easily keep track of dirty info, the guest > > physical address translation to host is difficult to keep in sync > > (imagine the complexity of memory hotplug). > > > > Basically, it's easy to hack up but extremely hard to do something that > > works correctly overall. > > > > There isn't a compelling reason to implement something like this other > > than avoiding getting code into QEMU. Best to just submit your device > > to QEMU for inclusion. > > > > If you want to avoid copying in a vswitch, better to use something like > > vmsplice as I outlined in another thread. > > > > > This is not as hard as it may sound. > > > qemu would open a domain socket to this process and map VM memory over > > > to the other side. This can be accomplished by having file descriptors > > > in qemu to VM memory (reusing -mem-path code) and passing those over the > > > domain socket. The other side can then just mmap them. The socket would > > > also be used for configuration and I/O by the guest on the PCI > > > I/O/memory regions. You could also use this to do IRQs or use eventfds, > > > whatever works better. > > > > > > To have a zero copy userspace switch, the switch would offer virtio-net > > > devices to any qemu that wants to connect to it and implement the > > > complete device logic itself. Since it has access to all guest memory, > > > it can just do memcpy for packet data. Of course, this only works for > > > 64-bit systems, because you need vast amounts of virtual address space. > > > In my experience, doing this in userspace is _way less painful_. > > > > > > If you can get away with polling in the switch the overhead of doing all > > > this in userspace is zero. And as long as you can rate-limit explicit > > > notifications over the socket even that overhead should be okay. > > > > > > Opinions? > > > > I don't see any compelling reason to do something like this. It's > > jumping through a tremendous number of hoops to avoid putting code that > > belongs in QEMU in tree. > > > > Regards, > > > > Anthony Liguori > > > > > > > > Julian > > OTOH an in-tree device that runs in a separate process would > be useful e.g. for security. > For example, we could limit a virtio-net device process > to only access tap and vhost files. For tap or vhost files only this is good for security. I'm not sure it has many advantages over a QEMU process under SELinux though. Obviously when the switch process has shared memory access to multiple guests' RAM, the security is worse than a QEMU process solution but better than a vhost kernel solution. So the security story is not a clear win. Stefan