From mboxrd@z Thu Jan 1 00:00:00 1970 From: Gregory Haskins Subject: [RFC PATCH v2 00/19] virtual-bus Date: Thu, 09 Apr 2009 12:30:41 -0400 Message-ID: <20090409155200.32740.19358.stgit@dev.haskins.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: QUOTED-PRINTABLE Cc: agraf@suse.de, pmullaney@novell.com, pmorreale@novell.com, anthony@codemonkey.ws, rusty@rustcorp.com.au, netdev@vger.kernel.org, kvm@vger.kernel.org, avi@redhat.com, bhutchings@solarflare.com, andi@firstfloor.org, gregkh@suse.de, herber@gondor.apana.org.au, chrisw@sous-sol.org, shemminger@vyatta.com To: linux-kernel@vger.kernel.org Return-path: Sender: kvm-owner@vger.kernel.org List-Id: netdev.vger.kernel.org This is release v2. Changes since v1: *) Incorporated review feedback from Stephen Hemminger on vbus-enet dri= ver *) Added support for connecting to vbus devices from userspace *) Added support for a virtio-vbus transport to allow virtio drivers to work with vbus (needs testing and backend models). (Avi, I know I still owe you a reply re the PCI debate) Todo: *) Develop some kind of hypercall registration mechanism for KVM so tha= t we can use that as an integration point instead of directly hooking kvm hypercalls *) Beef up the userspace event channel ABI to support different event t= ypes *) Add memory-registration support *) Integrate with qemu PCI device model to render vbus objects as PCI *) Develop some virtio backend devices. *) Support ethtool_ops for venet. --------------------------------------- RFC: Virtual-bus applies to v2.6.29 (will port to git HEAD soon) =46IRST OFF: Let me state that this is _not_ a KVM or networking specif= ic technology. Virtual-Bus is a mechanism for defining and deploying software =E2=80=9Cdevices=E2=80=9D directly in a Linux kernel. These d= evices are designed to be directly accessed from a variety of environments in an arbitrarly= nested fashion. The goal is provide for the potential for maxium IO performan= ce by providing the shortest and most efficient path to the "bare metal" kern= el, and thus the actual IO resources. For instance, an application can be writ= ten to run the same on baremetal as it does in guest userspace nested 10 level= s deep, all the while providing direct access to the resource, thus reducing la= tency and boosting throughput. A good way to think of this is perhaps like s= oftware based SR-IOV that supports nesting of the pass-through. Due to its design as an in-kernel resource, it also provides very stron= g notions of protection and isolation so as to not introduce a security compromise when compared to traditional/alternative models where such guarantees are provided by something like userspace or hardware. The example use-case we have provided supports a =E2=80=9Cvirtual-ether= net=E2=80=9D device being utilized in a KVM guest environment, so comparisons to virtio-net= will be natural. However, please note that this is but one use-case, of many= we have planned for the future (such as userspace bypass and RT guest support). The goal for right now is to describe what a virual-bus is and why we believe it is useful. We are intent to get this core technology merged, even if the networkin= g components are not accepted as is. It should be noted that, in many wa= ys, virtio could be considered complimentary to the technology. We could in fact, have implemented the virtual-ethernet using a virtio-ring, but it would have required ABI changes that we didn't want to yet propose without having the concept in general vetted and accepted by the commun= ity. [Update: this release includes a virtio-vbus transport, so virtio-net a= nd other such drivers can now run over vbus in addition to the venet syste= m provided] To cut to the chase, we recently measured our virtual-ethernet on=20 v2.6.29 on two 8-core x86_64 boxes with Chelsio T3 10GE connected back to back via cross over. We measured bare-metal performance, as well as a kvm guest (running the same kernel) connected to the T3 via a linux-bridge+tap configuration with a 1500 MTU. The results are as follows: Bare metal: tput =3D 4078Mb/s, round-trip =3D 25593pps (39us rtt) Virtio-net: tput =3D 4003Mb/s, round-trip =3D 320pps (3125us rtt) Venet: tput =3D 4050Mb/s, round-trip =3D 15255 (65us rtt) As you can see, all three technologies can achieve (MTU limited) line-r= ate, but the virtio-net solution is severely limited on the latency front (b= y a factor of 48:1) Note that the 320pps is technically artificially low in virtio-net, cau= sed by a a known design limitation to use a timer for tx-mitigation. However, n= ote that even when removing the timer from the path the best we could achieve wa= s 350us-450us of latency, and doing so causes the tput to drop to 1300Mb/= s. So even in this case, I think the in-kernel results presents a compelli= ng argument for the new model presented. [Update: Anthony Ligouri is working on this userspace implementation pr= oblem currently and has obtained significant performance gains by utilizing s= ome of the techniques we use in this patch set as well. More details to come.= ] When we jump to 9000 byte MTU, the situation looks similar Bare metal: tput =3D 9717Mb/s, round-trip =3D 30396pps (33us rtt) Virtio-net: tput =3D 4578Mb/s, round-trip =3D 249pps (4016us rtt) Venet: tput =3D 5802Mb/s, round-trip =3D 15127 (66us rtt) Note that even the throughput was slightly better in this test for vene= t, though neither venet nor virtio-net could achieve line-rate. I suspect some t= uning may allow these numbers to improve, TBD. So with that said, lets jump into the description: Virtual-Bus: What is it? -------------------- Virtual-Bus is a kernel based IO resource container technology. It is = modeled on a concept similar to the Linux Device-Model (LDM), where we have bus= es, devices, and drivers as the primary actors. However, VBUS has several distinctions when contrasted with LDM: 1) "Busses" in LDM are relatively static and global to the kernel (e.= g. "PCI", "USB", etc). VBUS buses are arbitrarily created and destro= yed dynamically, and are not globally visible. Instead they are defin= ed as visible only to a specific subset of the system (the contained con= text). 2) "Devices" in LDM are typically tangible physical (or sometimes log= ical) devices. VBUS devices are purely software abstractions (which may= or may not have one or more physical devices behind them). Devices m= ay also be arbitrarily created or destroyed by software/administrativ= e action as opposed to by a hardware discovery mechanism. 3) "Drivers" in LDM sit within the same kernel context as the busses = and devices they interact with. VBUS drivers live in a foreign context (such as userspace, or a virtual-machine guest). The idea is that a vbus is created to contain access to some IO service= s. Virtual devices are then instantiated and linked to a bus to grant acce= ss to drivers actively present on the bus. Drivers will only have visibility= to devices present on their respective bus, and nothing else. Virtual devices are defined by modules which register a deviceclass wit= h the system. A deviceclass simply represents a type of device that _may_ be instantiated into a device, should an administrator wish to do so. Onc= e this has happened, the device may be associated with one or more buses = where it will become visible to all clients of those respective buses. Why do we need this? ---------------------- There are various reasons why such a construct may be useful. One of t= he most interesting use cases is for virtualization, such as KVM. Hypervi= sors today provide virtualized IO resources to a guest, but this is often at= a cost in both latency and throughput compared to bare metal performance. Uti= lizing para-virtual resources instead of emulated devices helps to mitigate th= is penalty, but even these techniques to date have not fully realized the potential of the underlying bare-metal hardware. Some of the performance differential is unavoidable just given the extr= a processing that occurs due to the deeper stack (guest+host). However, = some of this overhead is a direct result of the rather indirect path most hyper= visors use to route IO. For instance, KVM uses PIO faults from the guest to t= rigger a guest->host-kernel->host-userspace->host-kernel sequence of events. Contrast this to a typical userspace application on the host which must= only traverse app->kernel for most IO. The fact is that the linux kernel is already great at managing access t= o IO resources. Therefore, if you have a hypervisor that is based on the li= nux kernel, is there some way that we can allow the hypervisor to manage IO directly instead of forcing this convoluted path? The short answer is: "not yet" ;) In order to use such a concept, we need some new facilties. For one, w= e need to be able to define containers with their corresponding access-co= ntrol so that guests do not have unmitigated access to anything they wish. Seco= nd, we also need to define some forms of memory access that is uniform in t= he face of various clients (e.g. "copy_to_user()" cannot be assumed to work for= , say, a KVM vcpu context). Lastly, we need to provide access to these resour= ces in a way that makes sense for the application, such as asynchronous commun= ication paths and minimizing context switches. =46or more details, please visit our wiki at: http://developer.novell.com/wiki/index.php/Virtual-bus Regards, -Greg --- Gregory Haskins (19): virtio: add a vbus transport vbus: add a userspace connector kvm: Add guest-side support for VBUS kvm: Add VBUS support to the host kvm: add dynamic IRQ support kvm: add a reset capability x86: allow the irq->vector translation to be determined outside o= f ioapic venettap: add scatter-gather support venet: add scatter-gather support venet-tap: Adds a "venet" compatible "tap" device to VBUS net: Add vbus_enet driver venet: add the ABI definitions for an 802.x packet interface ioq: add vbus helpers ioq: Add basic definitions for a shared-memory, lockless queue vbus: add a "vbus-proxy" bus model for vbus_driver objects vbus: add bus-registration notifiers vbus: add connection-client helper infrastructure vbus: add virtual-bus definitions shm-signal: shared-memory signals Documentation/vbus.txt | 386 +++++++++ arch/x86/Kconfig | 16=20 arch/x86/Makefile | 3=20 arch/x86/include/asm/irq.h | 6=20 arch/x86/include/asm/kvm_host.h | 9=20 arch/x86/include/asm/kvm_para.h | 12=20 arch/x86/kernel/io_apic.c | 25 + arch/x86/kvm/Kconfig | 9=20 arch/x86/kvm/Makefile | 6=20 arch/x86/kvm/dynirq.c | 329 ++++++++ arch/x86/kvm/guest/Makefile | 2=20 arch/x86/kvm/guest/dynirq.c | 95 ++ arch/x86/kvm/x86.c | 13=20 arch/x86/kvm/x86.h | 12=20 drivers/Makefile | 2=20 drivers/net/Kconfig | 13=20 drivers/net/Makefile | 1=20 drivers/net/vbus-enet.c | 907 +++++++++++++++++++++ drivers/vbus/devices/Kconfig | 17=20 drivers/vbus/devices/Makefile | 1=20 drivers/vbus/devices/venet-tap.c | 1609 ++++++++++++++++++++++++++++++= ++++++++ drivers/vbus/proxy/Makefile | 2=20 drivers/vbus/proxy/kvm.c | 726 +++++++++++++++++ drivers/virtio/Kconfig | 15=20 drivers/virtio/Makefile | 1=20 drivers/virtio/virtio_vbus.c | 496 ++++++++++++ fs/proc/base.c | 96 ++ include/linux/ioq.h | 410 ++++++++++ include/linux/kvm.h | 4=20 include/linux/kvm_guest.h | 7=20 include/linux/kvm_host.h | 27 + include/linux/kvm_para.h | 60 + include/linux/sched.h | 4=20 include/linux/shm_signal.h | 188 ++++ include/linux/vbus.h | 166 ++++ include/linux/vbus_client.h | 115 +++ include/linux/vbus_device.h | 424 ++++++++++ include/linux/vbus_driver.h | 80 ++ include/linux/vbus_userspace.h | 48 + include/linux/venet.h | 82 ++ include/linux/virtio_vbus.h | 163 ++++ kernel/Makefile | 1=20 kernel/exit.c | 2=20 kernel/fork.c | 2=20 kernel/vbus/Kconfig | 55 + kernel/vbus/Makefile | 11=20 kernel/vbus/attribute.c | 52 + kernel/vbus/client.c | 543 +++++++++++++ kernel/vbus/config.c | 275 ++++++ kernel/vbus/core.c | 626 +++++++++++++++ kernel/vbus/devclass.c | 124 +++ kernel/vbus/map.c | 72 ++ kernel/vbus/map.h | 41 + kernel/vbus/proxy.c | 216 +++++ kernel/vbus/shm-ioq.c | 89 ++ kernel/vbus/userspace-client.c | 485 +++++++++++ kernel/vbus/vbus.h | 117 +++ kernel/vbus/virtio.c | 628 +++++++++++++++ lib/Kconfig | 22 + lib/Makefile | 2=20 lib/ioq.c | 298 +++++++ lib/shm_signal.c | 186 ++++ virt/kvm/kvm_main.c | 37 + virt/kvm/vbus.c | 1307 ++++++++++++++++++++++++++++++= + 64 files changed, 11777 insertions(+), 1 deletions(-) create mode 100644 Documentation/vbus.txt create mode 100644 arch/x86/kvm/dynirq.c create mode 100644 arch/x86/kvm/guest/Makefile create mode 100644 arch/x86/kvm/guest/dynirq.c create mode 100644 drivers/net/vbus-enet.c create mode 100644 drivers/vbus/devices/Kconfig create mode 100644 drivers/vbus/devices/Makefile create mode 100644 drivers/vbus/devices/venet-tap.c create mode 100644 drivers/vbus/proxy/Makefile create mode 100644 drivers/vbus/proxy/kvm.c create mode 100644 drivers/virtio/virtio_vbus.c create mode 100644 include/linux/ioq.h create mode 100644 include/linux/kvm_guest.h create mode 100644 include/linux/shm_signal.h create mode 100644 include/linux/vbus.h create mode 100644 include/linux/vbus_client.h create mode 100644 include/linux/vbus_device.h create mode 100644 include/linux/vbus_driver.h create mode 100644 include/linux/vbus_userspace.h create mode 100644 include/linux/venet.h create mode 100644 include/linux/virtio_vbus.h create mode 100644 kernel/vbus/Kconfig create mode 100644 kernel/vbus/Makefile create mode 100644 kernel/vbus/attribute.c create mode 100644 kernel/vbus/client.c create mode 100644 kernel/vbus/config.c create mode 100644 kernel/vbus/core.c create mode 100644 kernel/vbus/devclass.c create mode 100644 kernel/vbus/map.c create mode 100644 kernel/vbus/map.h create mode 100644 kernel/vbus/proxy.c create mode 100644 kernel/vbus/shm-ioq.c create mode 100644 kernel/vbus/userspace-client.c create mode 100644 kernel/vbus/vbus.h create mode 100644 kernel/vbus/virtio.c create mode 100644 lib/ioq.c create mode 100644 lib/shm_signal.c create mode 100644 virt/kvm/vbus.c --=20