From: Alex Williamson <alex.williamson@redhat.com>
To: "Benoît Canet" <benoit.canet@irqsave.net>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] Guest IOMMU and Cisco usnic
Date: Wed, 12 Feb 2014 17:03:33 -0700 [thread overview]
Message-ID: <1392249813.15608.351.camel@ul30vt.home> (raw)
In-Reply-To: <20140212225102.GB16184@irqsave.net>
On Wed, 2014-02-12 at 23:51 +0100, Benoît Canet wrote:
> The Wednesday 12 Feb 2014 à 12:34:25 (-0700), Alex Williamson wrote :
> > On Wed, 2014-02-12 at 19:10 +0100, Benoît Canet wrote:
> > > Hi Alex,
> > >
> > > After the IRC conversation we had a few days ago I understood that guest IOMMU
> > > was not implemented.
> > >
> > > I have a real use case for it:
> > >
> > > Cisco usnic allow to write MPI applications while driving the network card in
> > > userspace in order to optimize the latency. It's made for compute clusters.
> > >
> > > The typical cloud provider don't provide bare metal access but only vms on top
> > > of Cisco's hardware hence VFIO is using the IOMMU to passthrough the NIC to the
> > > guest and no IOMMU is present in the guest.
> > >
> > > questions: Would writing a performing guest IOMMU implementation be possible ?
> > > How complex this project looks for someone knowing IOMMUs issues ?
> > >
> > > The ideal implementation would forward the IOMMU work to the host hardware for
> > > speed.
> > >
> > > I can devote time writing the feature if it's doable.
> >
> > Hi Benoît,
> >
> > I imagine it's doable, but it's certainly not trivial, beyond that I
> > haven't put much thought into it.
> >
> > VFIO running in a guest would need an IOMMU that implements both the
> > IOMMU API and IOMMU groups. Whether that comes from an emulated
> > physical IOMMU (like VT-d) or from a new paravirt IOMMU would be for you
> > to decide. VT-d would imply using a PCIe chipset like Q35 and trying to
> > bandage on VT-d or updating Q35 to something that natively supports
> > VT-d. Getting a sufficiently similar PCIe hierarchy between host an
> > guest would also be required.
>
> This Cisco thing usnic (driver/infiniband/hw/usnic) does not seems to use VFIO
> at all and seems to be hardcoded to make use of an intel IOMMU.
>
> I don't know if it's a good thing or not.
Sorry, I got a little off track assuming usnic was a VFIO userspace
driver. Peeking quickly at it, it looks like it also uses the IOMMU
API, so unless I missed the VT-d specific parts, a pv IOMMU in the guest
might allow some simplification if you don't care about non-Linux
support.
> > The current model of putting all guest devices in a single IOMMU domain
> > on the host is likely not what you would want and might imply a new VFIO
> > IOMMU backend that is better tuned for separate domains, sparse
> > mappings, and low-latency. VFIO has a modular IOMMU design, so this
> > isn't architecturally a problem. The VFIO user (QEMU) is able to select
> > which backend to use and the code is written with supporting multiple
> > backends in mind.
> >
> > A complication you'll have is that the granularity of IOMMU operations
> > through VFIO is at the IOMMU group level, so the guest would not be able
> > to easily split devices grouped together on the host between separate
> > users in the guest. That could be modeled as a conventional PCI bridge
> > masking the requester ID of devices in the guest such that host groups
> > are mirrored as guest groups.
>
> I think that users would be happy with only one palo ucs VF wrapped by usnic
> in the guest. I definitively need to check this point.
The solution should support multiple devices though, it may just require
multiple guest IOMMUs and fairly strict configuration constraints.
> > There might also be more simple "punch-through" ways to do it, for
> > instance what if instead of trying to make it work like it does on the
> > host we invented a paravirt VFIO interface and the vfio-pv driver in the
> > guest populated /dev/vfio as slightly modified passthroughs to the host
> > fds. The guest OS may not even really need to be aware of the device.
> >
>
> As I am not really interested in nesting VFIO but using the intel IOMMU directly
> in the guest a "punch-through" method would be fine.
I was doing a lot of hand-waving for a vfio-pv punch-though, but I don't
even have a vague idea of what an IOMMU API punch-through would look
like. Seems like you need to evaluate if the pain of emulating VT-d is
greater than the pain of creating a new pv IOMMU and which is likely to
perform better. Thanks,
Alex
> > It's an interesting project and certainly a valid use case. I'd also
> > like to see things like Intel's DPDK move to using VFIO, but the current
> > UIO DPDK is often used in guests. Thanks,
>
> I ask this to Thomas Monjalon the DPDK maintainer.
>
> Thanks,
>
> Best regards
>
> Benoît
>
> >
> > Alex
> >
> >
prev parent reply other threads:[~2014-02-13 0:03 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-02-12 18:10 [Qemu-devel] Guest IOMMU and Cisco usnic Benoît Canet
2014-02-12 19:34 ` Alex Williamson
2014-02-12 22:38 ` Benoît Canet
2014-02-12 22:51 ` Benoît Canet
2014-02-13 0:03 ` Alex Williamson [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1392249813.15608.351.camel@ul30vt.home \
--to=alex.williamson@redhat.com \
--cc=benoit.canet@irqsave.net \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).