From: Tom Lyon <pugs@lyon-about.com>
To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: linux-pci@vger.kernel.org, mbranton@gmail.com,
alexey.zaytsev@gmail.com, jbarnes@virtuousgeek.org,
linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
randy.dunlap@oracle.com, arnd@arndb.de, joro@8bytes.org,
hjk@linutronix.de, avi@redhat.com, gregkh@suse.de,
chrisw@sous-sol.org, alex.williamson@redhat.com, mst@redhat.com
Subject: Re: [ANNOUNCE] VFIO V6 & public VFIO repositories
Date: Tue, 21 Dec 2010 11:48:43 -0800 [thread overview]
Message-ID: <201012211148.43941.pugs@lyon-about.com> (raw)
In-Reply-To: <1292909853.16694.726.camel@pasglop>
On Monday, December 20, 2010 09:37:33 pm Benjamin Herrenschmidt wrote:
> Hi Tom, just wrote that to linux-pci in reply to your VFIO annouce,
> but your email bounced. Alex gave me your ieee one instead, I'm sending
> this copy to you, please feel free to reply on the list !
>
> Cheers,
> Ben.
>
> On Tue, 2010-12-21 at 16:29 +1100, Benjamin Herrenschmidt wrote:
> > On Mon, 2010-11-22 at 15:21 -0800, Tom Lyon wrote:
> > > VFIO "driver" development has moved to a publicly accessible
> > > respository
> > >
> > > on github:
> > > git://github.com/pugs/vfio-linux-2.6.git
> > >
> > > This is a clone of the Linux-2.6 tree with all VFIO changes on the vfio
> > > branch (which is the default). There is a tag 'vfio-v6' marking the
> > > latest "release" of VFIO.
> > >
> > > In addition, I am open-sourcing my user level code which uses VFIO.
> > > It is a simple UDP/IP/Ethernet stack supporting 3 different VFIO based
> > >
> > > hardware drivers. This code is available at:
> > > git://github.com/pugs/vfio-user-level-drivers.git
> >
> > So I do have some concerns about this...
> >
> > So first, before I go into the meat of my issues, let's just drop a
> > quick one about the interface: why netlink ? I find it horrible
> > myself... Just confuses everything and adds overhead. ioctl's would have
> > been a better choice imho.
> >
> > Now, my actual issues, which in fact extend to the whole "generic" iommu
> > APIs that have been added to drivers/pci for "domains", and that in
> > turns "stains" VFIO in ways that I'm not sure I can use on POWER...
> >
> > I would appreciate your input on how you think is the best way for me to
> > solve some of these "mismatches" between our HW and this design.
> >
> > Basically, the whole iommu domain stuff has been entirely designed
> > around the idea that you can create those "domains" which are each an
> > entire address space, and put devices in there.
> >
> > This is sadly not how the IBM iommus work on POWER today...
> >
> > I have currently one "shared" DMA address space (per host bridge), but I
> > can assign regions of it to different devices (and I have limited
> > filtering capabilities so basically, a bus per region, a device per
> > region or a function per region).
> >
> > That means essentially that I cannot just create a mapping for the DMA
> > addresses I want, but instead, need to have some kind of "allocator" for
> > DMA translations (which we have in the kernel, ie, dma_map/unmap use a
> > bitmap allocator).
> >
> > I generally have 2 regions per device, one in 32-bit space of quite
> > limited size (some times as small as 128M window) and one in 64-bit
> > space that I can make quite large if I need to, enough to map all of
> > memory if that's really desired, using large pages or something like
> > that).
> >
> > Now that has various consequences vs. the interfaces betweem iommu
> >
> > domains and qemu, and VFIO:
> > - I don't quite see how I can translate the concept of domains and
> >
> > attaching devices to such domains. The basic idea won't work. The
> > domains in my case are essentially pre-existing, not created on-the-fly,
> > and may contain multiple devices tho I suppose I can assume for now that
> > we only support KVM pass-through with 1 device == 1 domain.
> >
> > I don't know how to sort that one out if the userspace or kvm code
> > assumes it can put multiple devices in one domain and they start to
> > magically share the translations...
> >
> > Not sure what the right approach here is. I could make the "Linux"
> > domain some artifical SW construct that contains a list of the real
> > iommu's it's "bound" to and establish translations in all of them... but
> > that isn't very efficient. If the guest kernel explicitely use some
> > iommu PV ops targeting a device, I need to only setup translations for
> > -that- device, not everything in the "domain".
> >
> > - The code in virt/kvm/iommu.c that assumes it can map the entire guest
> >
> > memory 1:1 in the IOMMU is just not usable for us that way. We -might-
> > be able to do that for 64-bit capable devices as we can create quite
> > large regions in the 64-bit space, but at the very least we need some
> > kind of offset, and the guest must know about it...
> >
> > - Similar deal with all the code that currently assume it can pick up a
> >
> > "virtual" address and create a mapping from that. Either we provide an
> > allocator, or if we want to keep the flexibility of userspace/kvm
> > choosing the virtual addresses (preferable), we need to convey some
> > "ranges" information down to the user.
> >
> > - Finally, my guest are always paravirt. There's well defined Hcalls
> >
> > for inserting/removing DMA translations and we're implementing these
> > since existing kernels already know how to use them. That means that
> > overall, I might simply not need to use any of the above.
> >
> > IE. I could have my own infrastructure for iommu, my H-calls populating
> > the target iommu directly from the kernel (kvm) or qemu (via ioctls in
> > the non-kvm case). Might be the best option ... but that would mean
> > somewhat disentangling VFIO from uiommu...
> >
> > Any suggestions ? Great ideas ?
Ben - I don't have any good news for you.
DMA remappers like on Power and Sparc have been around forever, the new thing
about Intel/AMD iommus is the per-device address spaces and the protection
inherent in having separate mappings for each device. If one is to trust a
user level app or virtual machine to program DMA registers directly, then you
really need per device translation.
That said, early versions of VFIO had a mapping mode that used the normal DMA
API instead of the iommu/uiommu api and assumed that the user was trusted, but
that wasn't interesting for the long term.
So if you want safe device assigment you're going to need hardware help.
> >
> > Cheers,
> > Ben.
> >
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-pci" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2010-12-21 19:54 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-11-22 23:21 [ANNOUNCE] VFIO V6 & public VFIO repositories Tom Lyon
2010-11-22 23:21 ` Tom Lyon
2010-12-21 5:29 ` Benjamin Herrenschmidt
[not found] ` <1292909853.16694.726.camel@pasglop>
2010-12-21 19:48 ` Tom Lyon [this message]
2010-12-21 21:33 ` Benjamin Herrenschmidt
2010-12-21 21:15 ` Alex Williamson
2010-12-21 23:00 ` [ANNOUNCE] VFIO V6 & public VFIO repositorie Benjamin Herrenschmidt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=201012211148.43941.pugs@lyon-about.com \
--to=pugs@lyon-about.com \
--cc=alex.williamson@redhat.com \
--cc=alexey.zaytsev@gmail.com \
--cc=arnd@arndb.de \
--cc=avi@redhat.com \
--cc=benh@kernel.crashing.org \
--cc=chrisw@sous-sol.org \
--cc=gregkh@suse.de \
--cc=hjk@linutronix.de \
--cc=jbarnes@virtuousgeek.org \
--cc=joro@8bytes.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mbranton@gmail.com \
--cc=mst@redhat.com \
--cc=pugs@ieee.org \
--cc=randy.dunlap@oracle.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.