From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33785) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WDie6-0006j3-MH for qemu-devel@nongnu.org; Wed, 12 Feb 2014 17:51:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1WDidz-0007dS-IY for qemu-devel@nongnu.org; Wed, 12 Feb 2014 17:51:10 -0500 Received: from paradis.irqsave.net ([62.212.105.220]:49860) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1WDidz-0007dE-0K for qemu-devel@nongnu.org; Wed, 12 Feb 2014 17:51:03 -0500 Date: Wed, 12 Feb 2014 23:51:02 +0100 From: =?iso-8859-1?Q?Beno=EEt?= Canet Message-ID: <20140212225102.GB16184@irqsave.net> References: <20140212181027.GB4225@irqsave.net> <1392233665.15608.299.camel@ul30vt.home> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <1392233665.15608.299.camel@ul30vt.home> Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] Guest IOMMU and Cisco usnic List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alex Williamson Cc: =?iso-8859-1?Q?Beno=EEt?= Canet , qemu-devel@nongnu.org The Wednesday 12 Feb 2014 =E0 12:34:25 (-0700), Alex Williamson wrote : > On Wed, 2014-02-12 at 19:10 +0100, Beno=EEt Canet wrote: > > Hi Alex, > >=20 > > After the IRC conversation we had a few days ago I understood that gu= est IOMMU > > was not implemented. > >=20 > > I have a real use case for it: > >=20 > > Cisco usnic allow to write MPI applications while driving the network= card in > > userspace in order to optimize the latency. It's made for compute clu= sters. > >=20 > > The typical cloud provider don't provide bare metal access but only v= ms on top > > of Cisco's hardware hence VFIO is using the IOMMU to passthrough the = NIC to the > > guest and no IOMMU is present in the guest. > >=20 > > questions: Would writing a performing guest IOMMU implementation be p= ossible ? > > How complex this project looks for someone knowing IOMMUs = issues ? > >=20 > > The ideal implementation would forward the IOMMU work to the host har= dware for > > speed. > >=20 > > I can devote time writing the feature if it's doable. >=20 > Hi Beno=EEt, >=20 > I imagine it's doable, but it's certainly not trivial, beyond that I > haven't put much thought into it. >=20 > VFIO running in a guest would need an IOMMU that implements both the > IOMMU API and IOMMU groups. Whether that comes from an emulated > physical IOMMU (like VT-d) or from a new paravirt IOMMU would be for yo= u > to decide. VT-d would imply using a PCIe chipset like Q35 and trying t= o > bandage on VT-d or updating Q35 to something that natively supports > VT-d. Getting a sufficiently similar PCIe hierarchy between host an > guest would also be required. This Cisco thing usnic (driver/infiniband/hw/usnic) does not seems to use= VFIO at all and seems to be hardcoded to make use of an intel IOMMU. I don't know if it's a good thing or not. >=20 > The current model of putting all guest devices in a single IOMMU domain > on the host is likely not what you would want and might imply a new VFI= O > IOMMU backend that is better tuned for separate domains, sparse > mappings, and low-latency. VFIO has a modular IOMMU design, so this > isn't architecturally a problem. The VFIO user (QEMU) is able to selec= t > which backend to use and the code is written with supporting multiple > backends in mind. >=20 > A complication you'll have is that the granularity of IOMMU operations > through VFIO is at the IOMMU group level, so the guest would not be abl= e > to easily split devices grouped together on the host between separate > users in the guest. That could be modeled as a conventional PCI bridge > masking the requester ID of devices in the guest such that host groups > are mirrored as guest groups. I think that users would be happy with only one palo ucs VF wrapped by us= nic in the guest. I definitively need to check this point. >=20 > There might also be more simple "punch-through" ways to do it, for > instance what if instead of trying to make it work like it does on the > host we invented a paravirt VFIO interface and the vfio-pv driver in th= e > guest populated /dev/vfio as slightly modified passthroughs to the host > fds. The guest OS may not even really need to be aware of the device. >=20 As I am not really interested in nesting VFIO but using the intel IOMMU d= irectly in the guest a "punch-through" method would be fine. > It's an interesting project and certainly a valid use case. I'd also > like to see things like Intel's DPDK move to using VFIO, but the curren= t > UIO DPDK is often used in guests. Thanks, I ask this to Thomas Monjalon the DPDK maintainer. Thanks, Best regards Beno=EEt >=20 > Alex >=20 >=20