qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: "Michael S. Tsirkin" <mst@redhat.com>
To: Anthony Liguori <anthony@codemonkey.ws>
Cc: Luke Gorrie <lukego@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"snabb-devel@googlegroups.com" <snabb-devel@googlegroups.com>,
	qemu-devel@nongnu.org, Stefan Hajnoczi <stefanha@redhat.com>
Subject: Re: [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O
Date: Mon, 27 May 2013 20:13:40 +0300	[thread overview]
Message-ID: <20130527171339.GB18800@redhat.com> (raw)
In-Reply-To: <87wqqk8ii4.fsf@codemonkey.ws>

On Mon, May 27, 2013 at 12:01:07PM -0500, Anthony Liguori wrote:
> Paolo Bonzini <pbonzini@redhat.com> writes:
> 
> > Il 27/05/2013 18:18, Anthony Liguori ha scritto:
> >> Paolo Bonzini <pbonzini@redhat.com> writes:
> >> 
> >>> Il 27/05/2013 11:34, Stefan Hajnoczi ha scritto:
> >>>> On Sun, May 26, 2013 at 11:32:49AM +0200, Luke Gorrie wrote:
> >>>>> Stefan put us onto the highly promising track of vhost/virtio. We have
> >>>>> implemented this between Snabb Switch and the Linux kernel, but not
> >>>>> directly between Snabb Switch and QEMU guests. The "roadblock" we have hit
> >>>>> is embarrasingly basic: QEMU is using user-to-kernel system calls to setup
> >>>>> vhost (open /dev/net/tun and /dev/vhost-net, ioctl()s) and I haven't found
> >>>>> a good way to map these towards Snabb Switch instead of the kernel.
> >>>>
> >>>> vhost_net is about connecting the a virtio-net speaking process to a
> >>>> tun-like device.  The problem you are trying to solve is connecting a
> >>>> virtio-net speaking process to Snabb Switch.
> >>>>
> >>>> Either you need to replace vhost or you need a tun-like device
> >>>> interface.
> >>>>
> >>>> How does your switch talk to hardware?
> >>>
> >>> And also, is your switch monolithic or does it consist of different
> >>> processes?
> >>>
> >>> If you already have processes talking to each other, the first thing
> >>> that came to my mind was a new network backend, similar to net/vde.c but
> >>> more featureful (so that you support the virtio headers for offloading,
> >>> for example).  Then you would use "-netdev snabb,id=net0 -device
> >>> e1000,netdev=net0".
> >> 
> >> It would be very interesting to combine this with vmsplice/splice.
> >
> > Was zero-copy vmsplice/splice actually ever implemented?  I thought it
> > was reverted.
> 
> Not sure what context you're talking about re: zero copy...  a pipe can
> store references to pages instead of having a buffer that stores data.
> That certainly is there today--otherwise the interface is pointless.
> 
> When splicing from pipe to pipe, you can move those references without
> copying the data.
> 
> When vmsplicing from a userspace region to a pipe, the kernel just
> stores references to the pages.  vmsplicing from a pipe to userspace
> OTOH will copy the data.  This is fixable at least when dealing with
> GIFT'd pages.  For guest-to-guest traffic, you wouldn't be gifting the
> pages I don't think.
> 
> For implementing guest-to-guest traffic, the source QEMU can vmsplice
> the packet to a pipe that is shared with the vswitch.  The vswitch can
> tee(3) the first N bytes to a second pipe such that it can read the
> info needed for routing decisions.
> 
> Once the decision is made, if it's a local guest, it can splice() the
> packet to the appropriate destination QEMU process or another vswitch
> daemon (no data copy here).
> 
> Finally, the destination QEMU process can vmsplice() from the pipe which
> will copy the data (this is the only copy).

AFAIK splice is mostly useless for networking as there's no way to
get notified when packet has been sent.

> If vswitch needs to route externally, then it would need to splice() to
> a macvtap.
> 
> macvtap should be able to send the packet without copying the data.  Not
> sure that this last work will work as expected but if it doesn't, that's
> a bug that can/should be fixed.
> 
> The kernel cannot do better than the above modulo any overhead from
> userspace context switching[*].

Also modulo scheduler latency - kernel processes packets
in interrupt context. There's a reason e.g. OVS runs data-path in
kernel.

>  Guest-to-guest requires a copy.
> Normally macvtap is undesirable because it's tightly connected to a
> network adapter but that is a desirable trait in this case.
> 
> N.B., I'm not advocating making all switching decisions in
> userspace. Just pointing out how it can be done efficiently.
> 
> [*] in theory the kernel could do zero copy receive but i'm not sure
> it's feasible in practice.
> 
> Regards,
> 
> Anthony Liguori
> 
> >
> > Paolo
> >
> >>> It would be slower than vhost-net, for example no zero-copy
> >>> transmission.
> >> 
> >> With splice, I think you could at least get single copy guest-to-guest
> >> networking which is about as good as can be done.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >> 
> >>>> 3. Use the kernel as a middle-man. Create a double-ended "veth"
> >>>> interface and have Snabb Switch and QEMU each open a PF_PACKET
> >>>> socket and accelerate it with VHOST_NET.
> >>>
> >>> As Michael, mentioned, this could be macvtap on the interface that you
> >>> have already created in the switch and passed to vhost-net.  Then you do
> >>> not have to do anything in QEMU.
> >>>
> >>> Paolo
> >>>
> >>>> If you are using the Linux network stack then it might be better to
> >>>> integrate with vhost maybe as a tun-like device driver.
> >>>>
> >>>> Stefan
> >>>>
> >>>>

  reply	other threads:[~2013-05-27 17:13 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-26  9:32 [Qemu-devel] snabbswitch integration with QEMU for userspace ethernet I/O Luke Gorrie
2013-05-27  9:34 ` Stefan Hajnoczi
2013-05-27 15:18   ` Michael S. Tsirkin
2013-05-27 15:43   ` Paolo Bonzini
2013-05-27 16:18     ` Anthony Liguori
2013-05-27 16:18       ` Paolo Bonzini
2013-05-27 17:01         ` Anthony Liguori
2013-05-27 17:13           ` Michael S. Tsirkin [this message]
2013-05-27 18:31             ` Anthony Liguori
2013-05-28 10:39       ` Luke Gorrie
2013-05-28 10:10   ` Luke Gorrie
2013-05-28 10:35     ` Stefan Hajnoczi
2013-05-28 11:36     ` Julian Stecklina
2013-05-28 11:53       ` Michael S. Tsirkin
2013-05-28 12:09         ` Julian Stecklina
2013-05-28 13:56           ` Michael S. Tsirkin
2013-05-28 15:35             ` Julian Stecklina
2013-05-28 15:44               ` Michael S. Tsirkin
2013-05-28 12:48         ` [Qemu-devel] [snabb-devel:276] " Luke Gorrie
2013-05-28 13:12           ` Julian Stecklina
2013-05-28 13:42             ` [Qemu-devel] [snabb-devel:280] " Luke Gorrie
2013-05-28 14:42         ` [Qemu-devel] [snabb-devel:276] " Luke Gorrie
2013-05-28 15:33           ` Julian Stecklina
2013-05-28 17:00       ` [Qemu-devel] " Anthony Liguori
2013-05-28 17:17         ` Michael S. Tsirkin
2013-05-28 18:55           ` Anthony Liguori
2013-05-29 10:31             ` Stefano Stabellini
2013-05-29 12:25               ` Michael S. Tsirkin
2013-05-29 13:04                 ` Stefano Stabellini
2013-06-04 12:19               ` [Qemu-devel] [snabb-devel:300] " Luke Gorrie
2013-06-04 12:49                 ` Julian Stecklina
2013-06-04 20:09                   ` [Qemu-devel] [snabb-devel:326] " Luke Gorrie
2013-06-04 12:56                 ` [Qemu-devel] [snabb-devel:300] " Michael S. Tsirkin
2013-06-05  6:09                   ` [Qemu-devel] [snabb-devel:327] " Luke Gorrie
2013-05-29  7:49           ` [Qemu-devel] " Stefan Hajnoczi
2013-05-29  9:08             ` Michael S. Tsirkin
2013-05-29 14:21               ` Stefan Hajnoczi
2013-05-29 14:48                 ` Michael S. Tsirkin
2013-05-29 16:02                 ` Julian Stecklina
2013-05-30  2:35                   ` ronnie sahlberg
2013-05-30  6:46                   ` Stefan Hajnoczi
2013-05-30  6:55                     ` Michael S. Tsirkin
2013-05-30  7:11                     ` [Qemu-devel] [snabb-devel:308] " Luke Gorrie
2013-05-30  8:08                     ` [Qemu-devel] " Julian Stecklina
2013-05-29 12:32         ` Julian Stecklina
2013-05-29 14:31           ` Stefan Hajnoczi
2013-05-29 15:59             ` Julian Stecklina
2013-05-28 11:58     ` Stefan Hajnoczi
2013-10-21 10:29       ` Luke Gorrie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130527171339.GB18800@redhat.com \
    --to=mst@redhat.com \
    --cc=anthony@codemonkey.ws \
    --cc=lukego@gmail.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=snabb-devel@googlegroups.com \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).