From: Alex Williamson <alex.williamson@redhat.com>
To: Aaron Fabbri <aafabbri@cisco.com>
Cc: Alexey Kardashevskiy <aik@au1.ibm.com>,
kvm@vger.kernel.org, Paul Mackerras <pmac@au1.ibm.com>,
qemu-devel <qemu-devel@nongnu.org>, chrisw <chrisw@sous-sol.org>,
iommu <iommu@lists.linux-foundation.org>,
Avi Kivity <avi@redhat.com>,
"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
benve@cisco.com
Subject: Re: [Qemu-devel] kvm PCI assignment & VFIO ramblings
Date: Tue, 23 Aug 2011 12:01:14 -0600 [thread overview]
Message-ID: <1314122475.2859.76.camel@bling.home> (raw)
In-Reply-To: <CA79326A.FB97%aafabbri@cisco.com>
On Tue, 2011-08-23 at 10:33 -0700, Aaron Fabbri wrote:
>
>
> On 8/23/11 10:01 AM, "Alex Williamson" <alex.williamson@redhat.com> wrote:
>
> > On Tue, 2011-08-23 at 16:54 +1000, Benjamin Herrenschmidt wrote:
> >> On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote:
> >>
> >>> I'm not following you.
> >>>
> >>> You have to enforce group/iommu domain assignment whether you have the
> >>> existing uiommu API, or if you change it to your proposed
> >>> ioctl(inherit_iommu) API.
> >>>
> >>> The only change needed to VFIO here should be to make uiommu fd assignment
> >>> happen on the groups instead of on device fds. That operation fails or
> >>> succeeds according to the group semantics (all-or-none assignment/same
> >>> uiommu).
> >>
> >> Ok, so I missed that part where you change uiommu to operate on group
> >> fd's rather than device fd's, my apologies if you actually wrote that
> >> down :-) It might be obvious ... bare with me I just flew back from the
> >> US and I am badly jet lagged ...
> >
> > I missed it too, the model I'm proposing entirely removes the uiommu
> > concept.
> >
> >> So I see what you mean, however...
> >>
> >>> I think the question is: do we force 1:1 iommu/group mapping, or do we allow
> >>> arbitrary mapping (satisfying group constraints) as we do today.
> >>>
> >>> I'm saying I'm an existing user who wants the arbitrary iommu/group mapping
> >>> ability and definitely think the uiommu approach is cleaner than the
> >>> ioctl(inherit_iommu) approach. We considered that approach before but it
> >>> seemed less clean so we went with the explicit uiommu context.
> >>
> >> Possibly, the question that interest me the most is what interface will
> >> KVM end up using. I'm also not terribly fan with the (perceived)
> >> discrepancy between using uiommu to create groups but using the group fd
> >> to actually do the mappings, at least if that is still the plan.
> >
> > Current code: uiommu creates the domain, we bind a vfio device to that
> > domain via a SET_UIOMMU_DOMAIN ioctl on the vfio device, then do
> > mappings via MAP_DMA on the vfio device (affecting all the vfio devices
> > bound to the domain)
> >
> > My current proposal: "groups" are predefined. groups ~= iommu domain.
>
> This is my main objection. I'd rather not lose the ability to have multiple
> devices (which are all predefined as singleton groups on x86 w/o PCI
> bridges) share IOMMU resources. Otherwise, 20 devices sharing buffers would
> require 20x the IOMMU/ioTLB resources. KVM doesn't care about this case?
We do care, I just wasn't prioritizing it as heavily since I think the
typical model is probably closer to 1 device per guest.
> > The iommu domain would probably be allocated when the first device is
> > bound to vfio. As each device is bound, it gets attached to the group.
> > DMAs are done via an ioctl on the group.
> >
> > I think group + uiommu leads to effectively reliving most of the
> > problems with the current code. The only benefit is the group
> > assignment to enforce hardware restrictions. We still have the problem
> > that uiommu open() = iommu_domain_alloc(), whose properties are
> > meaningless without attached devices (groups). Which I think leads to
> > the same awkward model of attaching groups to define the domain, then we
> > end up doing mappings via the group to enforce ordering.
>
> Is there a better way to allow groups to share an IOMMU domain?
>
> Maybe, instead of having an ioctl to allow a group A to inherit the same
> iommu domain as group B, we could have an ioctl to fully merge two groups
> (could be what Ben was thinking):
>
> A.ioctl(MERGE_TO_GROUP, B)
>
> The group A now goes away and its devices join group B. If A ever had an
> iommu domain assigned (and buffers mapped?) we fail.
>
> Groups cannot get smaller (they are defined as minimum granularity of an
> IOMMU, initially). They can get bigger if you want to share IOMMU
> resources, though.
>
> Any downsides to this approach?
That's sort of the way I'm picturing it. When groups are bound
together, they effectively form a pool, where all the groups are peers.
When the MERGE/BIND ioctl is called on group A and passed the group B
fd, A can check compatibility of the domain associated with B, unbind
devices from the B domain and attach them to the A domain. The B domain
would then be freed and it would bump the refcnt on the A domain. If we
need to remove A from the pool, we call UNMERGE/UNBIND on B with the A
fd, it will remove the A devices from the shared object, disassociate A
with the shared object, re-alloc a domain for A and rebind A devices to
that domain.
This is where it seems like it might be helpful to make a GET_IOMMU_FD
ioctl so that an iommu object is ubiquitous and persistent across the
pool. Operations on any group fd work on the pool as a whole. Thanks,
Alex
next prev parent reply other threads:[~2011-08-23 18:01 UTC|newest]
Thread overview: 93+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <1311983933.8793.42.camel@pasglop>
2011-07-30 18:20 ` [Qemu-devel] kvm PCI assignment & VFIO ramblings Alex Williamson
2011-07-30 23:54 ` Benjamin Herrenschmidt
2011-08-01 18:59 ` Alex Williamson
2011-08-02 2:00 ` Benjamin Herrenschmidt
2011-07-30 23:55 ` Benjamin Herrenschmidt
2011-08-02 8:28 ` David Gibson
2011-08-02 18:14 ` Alex Williamson
2011-08-02 18:35 ` Alex Williamson
2011-08-03 2:04 ` David Gibson
2011-08-03 3:44 ` Alex Williamson
2011-08-04 0:39 ` David Gibson
2011-08-08 8:28 ` Avi Kivity
2011-08-09 23:24 ` Alex Williamson
2011-08-10 2:48 ` Benjamin Herrenschmidt
2011-08-20 16:51 ` Alex Williamson
2011-08-22 5:55 ` David Gibson
2011-08-22 15:45 ` Alex Williamson
2011-08-22 21:01 ` Benjamin Herrenschmidt
2011-08-23 19:30 ` Alex Williamson
2011-08-23 23:51 ` Benjamin Herrenschmidt
2011-08-24 3:40 ` Alexander Graf
2011-08-24 14:47 ` Alex Williamson
2011-08-24 8:43 ` Joerg Roedel
2011-08-24 14:56 ` Alex Williamson
2011-08-25 11:01 ` Roedel, Joerg
2011-08-23 2:38 ` David Gibson
2011-08-23 16:23 ` Alex Williamson
2011-08-23 23:41 ` Benjamin Herrenschmidt
2011-08-24 3:36 ` Alexander Graf
2011-08-22 6:30 ` Avi Kivity
2011-08-22 10:46 ` Joerg Roedel
2011-08-22 10:51 ` Avi Kivity
2011-08-22 12:36 ` Roedel, Joerg
2011-08-22 12:42 ` Avi Kivity
2011-08-22 12:55 ` Roedel, Joerg
2011-08-22 13:06 ` Avi Kivity
2011-08-22 13:15 ` Roedel, Joerg
2011-08-22 13:17 ` Avi Kivity
2011-08-22 14:37 ` Roedel, Joerg
2011-08-22 20:53 ` Benjamin Herrenschmidt
2011-08-22 17:25 ` Joerg Roedel
2011-08-22 19:17 ` Alex Williamson
2011-08-23 13:14 ` Roedel, Joerg
2011-08-23 17:08 ` Alex Williamson
2011-08-24 8:52 ` Roedel, Joerg
2011-08-24 15:07 ` Alex Williamson
2011-08-25 12:31 ` Roedel, Joerg
2011-08-25 13:25 ` Alexander Graf
2011-08-26 4:24 ` David Gibson
2011-08-26 9:24 ` Roedel, Joerg
2011-08-28 13:14 ` Avi Kivity
2011-08-28 13:56 ` Joerg Roedel
2011-08-28 14:04 ` Avi Kivity
2011-08-30 16:14 ` Joerg Roedel
2011-08-22 21:03 ` Benjamin Herrenschmidt
2011-08-23 13:18 ` Roedel, Joerg
2011-08-23 23:35 ` Benjamin Herrenschmidt
2011-08-24 8:53 ` Roedel, Joerg
2011-08-22 20:29 ` aafabbri
2011-08-22 20:49 ` Benjamin Herrenschmidt
2011-08-22 21:38 ` aafabbri
2011-08-22 21:49 ` Benjamin Herrenschmidt
2011-08-23 0:52 ` aafabbri
2011-08-23 6:54 ` Benjamin Herrenschmidt
2011-08-23 11:09 ` Joerg Roedel
2011-08-23 17:01 ` Alex Williamson
2011-08-23 17:33 ` Aaron Fabbri
2011-08-23 18:01 ` Alex Williamson [this message]
2011-08-24 9:10 ` Joerg Roedel
2011-08-24 21:13 ` Alex Williamson
2011-08-25 10:54 ` Roedel, Joerg
2011-08-25 15:38 ` Don Dutile
2011-08-25 16:46 ` Roedel, Joerg
2011-08-25 17:20 ` Alex Williamson
2011-08-25 18:05 ` Joerg Roedel
2011-08-26 18:04 ` Alex Williamson
2011-08-30 16:13 ` Joerg Roedel
2011-08-23 11:04 ` Joerg Roedel
2011-08-23 16:54 ` aafabbri
2011-08-24 9:14 ` Roedel, Joerg
2011-08-24 9:33 ` David Gibson
2011-08-24 11:03 ` Roedel, Joerg
2011-08-26 4:20 ` David Gibson
2011-08-26 9:33 ` Roedel, Joerg
2011-08-26 14:07 ` Alexander Graf
2011-08-26 15:24 ` Joerg Roedel
2011-08-26 15:29 ` Alexander Graf
2011-08-26 17:52 ` Aaron Fabbri
2011-08-26 19:35 ` Chris Wright
2011-08-26 20:17 ` Aaron Fabbri
2011-08-26 21:06 ` Chris Wright
2011-08-30 1:29 ` David Gibson
2011-08-04 10:35 ` Joerg Roedel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1314122475.2859.76.camel@bling.home \
--to=alex.williamson@redhat.com \
--cc=aafabbri@cisco.com \
--cc=aik@au1.ibm.com \
--cc=avi@redhat.com \
--cc=benve@cisco.com \
--cc=chrisw@sous-sol.org \
--cc=iommu@lists.linux-foundation.org \
--cc=kvm@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=pmac@au1.ibm.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).