From: "Michael S. Tsirkin" <mst@redhat.com>
To: Jan Kiszka <jan.kiszka@siemens.com>
Cc: Rik van Riel <riel@redhat.com>, Andrew Jones <drjones@redhat.com>,
Linux Virtualization <virtualization@lists.linux-foundation.org>,
Luke Gorrie <luke@snabb.co>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>
Subject: Re: [virtio-dev] Zerocopy VM-to-VM networking using virtio-net
Date: Mon, 27 Apr 2015 16:40:51 +0200 [thread overview]
Message-ID: <20150427163920-mutt-send-email-mst@redhat.com> (raw)
In-Reply-To: <553E480B.1020502@siemens.com>
On Mon, Apr 27, 2015 at 04:30:35PM +0200, Jan Kiszka wrote:
> Am 2015-04-27 um 15:01 schrieb Stefan Hajnoczi:
> > On Mon, Apr 27, 2015 at 1:55 PM, Jan Kiszka <jan.kiszka@siemens.com> wrote:
> >> Am 2015-04-27 um 14:35 schrieb Jan Kiszka:
> >>> Am 2015-04-27 um 12:17 schrieb Stefan Hajnoczi:
> >>>> On Sun, Apr 26, 2015 at 2:24 PM, Luke Gorrie <luke@snabb.co> wrote:
> >>>>> On 24 April 2015 at 15:22, Stefan Hajnoczi <stefanha@gmail.com> wrote:
> >>>>>>
> >>>>>> The motivation for making VM-to-VM fast is that while software
> >>>>>> switches on the host are efficient today (thanks to vhost-user), there
> >>>>>> is no efficient solution if the software switch is a VM.
> >>>>>
> >>>>>
> >>>>> I see. This sounds like a noble goal indeed. I would love to run the
> >>>>> software switch as just another VM in the long term. It would make it much
> >>>>> easier for the various software switches to coexist in the world.
> >>>>>
> >>>>> The main technical risk I see in this proposal is that eliminating the
> >>>>> memory copies might not have the desired effect. I might be tempted to keep
> >>>>> the copies but prevent the kernel from having to inspect the vrings (more
> >>>>> like vhost-user). But that is just a hunch and I suppose the first step
> >>>>> would be a prototype to check the performance anyway.
> >>>>>
> >>>>> For what it is worth here is my view of networking performance on x86 in the
> >>>>> Haswell+ era:
> >>>>> https://groups.google.com/forum/#!topic/snabb-devel/aez4pEnd4ow
> >>>>
> >>>> Thanks.
> >>>>
> >>>> I've been thinking about how to eliminate the VM <-> host <-> VM
> >>>> switching and instead achieve just VM <-> VM.
> >>>>
> >>>> The holy grail of VM-to-VM networking is an exitless I/O path. In
> >>>> other words, packets can be transferred between VMs without any
> >>>> vmexits (this requires a polling driver).
> >>>>
> >>>> Here is how it works. QEMU gets "-device vhost-user" so that a VM can
> >>>> act as the vhost-user server:
> >>>>
> >>>> VM1 (virtio-net guest driver) <-> VM2 (vhost-user device)
> >>>>
> >>>> VM1 has a regular virtio-net PCI device. VM2 has a vhost-user device
> >>>> and plays the host role instead of the normal virtio-net guest driver
> >>>> role.
> >>>>
> >>>> The ugly thing about this is that VM2 needs to map all of VM1's guest
> >>>> RAM so it can access the vrings and packet data. The solution to this
> >>>> is something like the Shared Buffers BAR but this time it contains not
> >>>> just the packet data but also the vring, let's call it the Shared
> >>>> Virtqueues BAR.
> >>>>
> >>>> The Shared Virtqueues BAR eliminates the need for vhost-net on the
> >>>> host because VM1 and VM2 communicate directly using virtqueue notify
> >>>> or polling vring memory. Virtqueue notify works by connecting an
> >>>> eventfd as ioeventfd in VM1 and irqfd in VM2. And VM2 would also have
> >>>> an ioeventfd that is irqfd for VM1 to signal completions.
> >>>
> >>> We had such a discussion before:
> >>> http://thread.gmane.org/gmane.comp.emulators.kvm.devel/123014/focus=279658
> >>>
> >>> Would be great to get this ball rolling again.
> >>>
> >>> Jan
> >>>
> >>
> >> But one challenge would remain even then (unless both sides only poll):
> >> exit-free inter-VM signaling, no? But that's a hardware issue first of all.
> >
> > To start with ioeventfd<->irqfd can be used. It incurs a light-weight
> > exit in VM1 and interrupt injection in VM2.
> >
> > For networking the cost is mitigated by NAPI drivers which switch
> > between interrupts and polling. During notification-heavy periods the
> > guests would use polling anyway.
> >
> > A hardware solution would be some kind of inter-guest interrupt
> > injection. I don't know VMX well enough to know whether that is
> > possible on Intel CPUs.
>
> Today, we have posted interrupts to avoid the vm-exit on the target CPU,
> but there is nothing yet (to my best knowledge) to avoid the exit on the
> sender side (unless we ignore security). That's the same problem with
> intra-guest IPIs, BTW.
>
> For throughput and given NAPI patterns, that's probably not an issue as
> you noted. It may be for latency, though, when almost every cycle counts.
>
> Jan
If you are counting cycles you likely can't afford the
interrupt latency under linux, so you have to poll
memory.
> --
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux
next prev parent reply other threads:[~2015-04-27 14:40 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <20150422170138.GA8388@stefanha-thinkpad.redhat.com>
2015-04-22 17:46 ` Zerocopy VM-to-VM networking using virtio-net Cornelia Huck
[not found] ` <20150422194603.1e650ec7.cornelia.huck@de.ibm.com>
2015-04-22 18:00 ` Stefan Hajnoczi
2015-04-23 16:54 ` Cornelia Huck
2015-04-24 8:12 ` [virtio-dev] " Luke Gorrie
2015-04-24 8:20 ` Paolo Bonzini
2015-04-24 9:47 ` Stefan Hajnoczi
2015-04-24 9:50 ` Stefan Hajnoczi
2015-04-24 12:17 ` Luke Gorrie
2015-04-24 13:10 ` Luke Gorrie
2015-04-24 13:23 ` Stefan Hajnoczi
2015-04-24 13:22 ` Stefan Hajnoczi
2015-04-26 13:24 ` Luke Gorrie
2015-04-27 10:17 ` Stefan Hajnoczi
2015-04-27 10:36 ` Michael S. Tsirkin
2015-04-27 12:35 ` Jan Kiszka
2015-04-27 12:55 ` Jan Kiszka
2015-04-27 13:01 ` Stefan Hajnoczi
2015-04-27 13:08 ` Muli Ben-Yehuda
2015-04-27 14:30 ` Jan Kiszka
2015-04-27 14:36 ` Luke Gorrie
2015-04-27 14:38 ` Jan Kiszka
2015-04-27 14:40 ` Michael S. Tsirkin [this message]
2015-04-27 12:57 ` Stefan Hajnoczi
2015-04-27 13:17 ` Michael S. Tsirkin
2015-04-24 12:34 ` Luke Gorrie
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20150427163920-mutt-send-email-mst@redhat.com \
--to=mst@redhat.com \
--cc=dgilbert@redhat.com \
--cc=drjones@redhat.com \
--cc=jan.kiszka@siemens.com \
--cc=luke@snabb.co \
--cc=pbonzini@redhat.com \
--cc=riel@redhat.com \
--cc=stefanha@redhat.com \
--cc=virtualization@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox