From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:36692) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWgxN-0001uG-Fx for qemu-devel@nongnu.org; Tue, 01 Sep 2015 04:30:19 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ZWgxJ-0007Pz-Gw for qemu-devel@nongnu.org; Tue, 01 Sep 2015 04:30:17 -0400 Received: from mx1.redhat.com ([209.132.183.28]:57120) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ZWgxJ-0007Ns-4d for qemu-devel@nongnu.org; Tue, 01 Sep 2015 04:30:13 -0400 Date: Tue, 1 Sep 2015 11:30:02 +0300 From: "Michael S. Tsirkin" Message-ID: <20150901111738-mutt-send-email-mst@redhat.com> References: <20150831160655-mutt-send-email-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] rfc: vhost user enhancements for vm2vm communication List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Varun Sethi Cc: "virtio-dev@lists.oasis-open.org" , Jan Kiszka , "Claudio.Fontana@huawei.com" , "qemu-devel@nongnu.org" , Linux Virtualization , "Nakajima, Jun" , "opnfv-tech-discuss@lists.opnfv.org" On Tue, Sep 01, 2015 at 03:03:12AM +0000, Varun Sethi wrote: > Hi Michael, > When you talk about VFIO in guest, is it with a purely emulated IOMMU i= n Qemu? This can use the emulated IOMMU in Qemu. That's probably fast enough if mappings are mostly static. We can also add a PV-IOMMU if necessary. > Also, I am not clear on the following points: > 1. How transient memory would be mapped using BAR in the backend VM The simplest way is that=20 each update sends a vhost-user message. backend gets it and mmaps it into backend QEMU and make it part of RAM memory slot. Or - backend QEMU could detect a pagefault on access and get the IOMMU from frontend QEMU - using vhost-user messages or from shared memory. > 2. How would the backend VM update the dirty page bitmap for the fronte= nd VM >=20 > Regards > Varun The easiest to implement way is probably for backend QEMU to setup dirty = tracking for the relevant slot (upon getting vhost user message from the frontend) then retrieve the dirty map from kvm and record it in a shared memory region (when do it? We could have an eventfd and/or vhost-user message to trigger this from the frontend QEMU, or just use a timer). An alternative is for backend VM to get access to dirty log (e.g. map it within BAR) and update it directly in shared memory. Seems like more work. Marc-Andr=E9 Lureau recently sent patches to support passing dirty log around, these would be useful. > > -----Original Message----- > > From: qemu-devel-bounces+varun.sethi=3Dfreescale.com@nongnu.org > > [mailto:qemu-devel-bounces+varun.sethi=3Dfreescale.com@nongnu.org] On > > Behalf Of Nakajima, Jun > > Sent: Monday, August 31, 2015 1:36 PM > > To: Michael S. Tsirkin > > Cc: virtio-dev@lists.oasis-open.org; Jan Kiszka; > > Claudio.Fontana@huawei.com; qemu-devel@nongnu.org; Linux > > Virtualization; opnfv-tech-discuss@lists.opnfv.org > > Subject: Re: [Qemu-devel] rfc: vhost user enhancements for vm2vm > > communication > >=20 > > On Mon, Aug 31, 2015 at 7:11 AM, Michael S. Tsirkin > > wrote: > > > Hello! > > > During the KVM forum, we discussed supporting virtio on top of > > > ivshmem. I have considered it, and came up with an alternative that > > > has several advantages over that - please see below. > > > Comments welcome. > >=20 > > Hi Michael, > >=20 > > I like this, and it should be able to achieve what I presented at KVM= Forum > > (vhost-user-shmem). > > Comments below. > >=20 > > > > > > ----- > > > > > > Existing solutions to userspace switching between VMs on the same h= ost > > > are vhost-user and ivshmem. > > > > > > vhost-user works by mapping memory of all VMs being bridged into th= e > > > switch memory space. > > > > > > By comparison, ivshmem works by exposing a shared region of memory = to > > all VMs. > > > VMs are required to use this region to store packets. The switch on= ly > > > needs access to this region. > > > > > > Another difference between vhost-user and ivshmem surfaces when > > > polling is used. With vhost-user, the switch is required to handle > > > data movement between VMs, if using polling, this means that 1 host > > > CPU needs to be sacrificed for this task. > > > > > > This is easiest to understand when one of the VMs is used with VF > > > pass-through. This can be schematically shown below: > > > > > > +-- VM1 --------------+ +---VM2-----------+ > > > | virtio-pci +-vhost-user-+ virtio-pci -- VF | -- VFIO -- = IOMMU -- NIC > > > +---------------------+ +-----------------+ > > > > > > > > > With ivshmem in theory communication can happen directly, with two = VMs > > > polling the shared memory region. > > > > > > > > > I won't spend time listing advantages of vhost-user over ivshmem. > > > Instead, having identified two advantages of ivshmem over vhost-use= r, > > > below is a proposal to extend vhost-user to gain the advantages of > > > ivshmem. > > > > > > > > > 1: virtio in guest can be extended to allow support for IOMMUs. Thi= s > > > provides guest with full flexibility about memory which is readable= or > > > write able by each device. > >=20 > > I assume that you meant VFIO only for virtio by "use of VFIO". To ge= t VFIO > > working for general direct-I/O (including VFs) in guests, as you know= , we > > need to virtualize IOMMU (e.g. VT-d) and the interrupt remapping tabl= e on > > x86 (i.e. nested VT-d). > >=20 > > > By setting up a virtio device for each other VM we need to communic= ate > > > to, guest gets full control of its security, from mapping all memor= y > > > (like with current vhost-user) to only mapping buffers used for > > > networking (like ivshmem) to transient mappings for the duration of > > > data transfer only. > >=20 > > And I think that we can use VMFUNC to have such transient mappings. > >=20 > > > This also allows use of VFIO within guests, for improved security. > > > > > > vhost user would need to be extended to send the mappings programme= d > > > by guest IOMMU. > >=20 > > Right. We need to think about cases where other VMs (VM3, etc.) join = the > > group or some existing VM leaves. > > PCI hot-plug should work there (as you point out at "Advantages over > > ivshmem" below). > >=20 > > > > > > 2. qemu can be extended to serve as a vhost-user client: > > > remote VM mappings over the vhost-user protocol, and map them into > > > another VM's memory. > > > This mapping can take, for example, the form of a BAR of a pci devi= ce, > > > which I'll call here vhost-pci - with bus address allowed by VM1's > > > IOMMU mappings being translated into offsets within this BAR within > > > VM2's physical memory space. > >=20 > > I think it's sensible. > >=20 > > > > > > Since the translation can be a simple one, VM2 can perform it withi= n > > > its vhost-pci device driver. > > > > > > While this setup would be the most useful with polling, VM1's > > > ioeventfd can also be mapped to another VM2's irqfd, and vice versa= , > > > such that VMs can trigger interrupts to each other without need for= a > > > helper thread on the host. > > > > > > > > > The resulting channel might look something like the following: > > > > > > +-- VM1 --------------+ +---VM2-----------+ > > > | virtio-pci -- iommu +--+ vhost-pci -- VF | -- VFIO -- IOMMU -- NI= C > > > +---------------------+ +-----------------+ > > > > > > comparing the two diagrams, a vhost-user thread on the host is no > > > longer required, reducing the host CPU utilization when polling is > > > active. At the same time, VM2 can not access all of VM1's memory -= it > > > is limited by the iommu configuration setup by VM1. > > > > > > > > > Advantages over ivshmem: > > > > > > - more flexibility, endpoint VMs do not have to place data at any > > > specific locations to use the device, in practice this likely > > > means less data copies. > > > - better standardization/code reuse > > > virtio changes within guests would be fairly easy to implement > > > and would also benefit other backends, besides vhost-user > > > standard hotplug interfaces can be used to add and remove these > > > channels as VMs are added or removed. > > > - migration support > > > It's easy to implement since ownership of memory is well defined. > > > For example, during migration VM2 can notify hypervisor of VM1 > > > by updating dirty bitmap each time is writes into VM1 memory. > >=20 > > Also, the ivshmem functionality could be implemented by this proposal= : > > - vswitch (or some VM) allocates memory regions in its address space,= and > > - it sets up that IOMMU mappings on the VMs be translated into the re= gions > >=20 > > > > > > Thanks, > > > > > > -- > > > MST > > > _______________________________________________ > > > Virtualization mailing list > > > Virtualization@lists.linux-foundation.org > > > https://lists.linuxfoundation.org/mailman/listinfo/virtualization > >=20 > >=20 > > -- > > Jun > > Intel Open Source Technology Center >=20