Re: [Qemu-devel] VFIO v2 design plan

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: David Gibson <dwg@au1.ibm.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: chrisw <chrisw@sous-sol.org>,
	Alexey Kardashevskiy <aik@au1.ibm.com>,
	kvm@vger.kernel.org, Paul Mackerras <pmac@au1.ibm.com>,
	"Roedel, Joerg" <Joerg.Roedel@amd.com>,
	agraf@suse.de, qemu-devel <qemu-devel@nongnu.org>,
	aafabbri <aafabbri@cisco.com>,
	iommu <iommu@lists.linux-foundation.org>,
	Avi Kivity <avi@redhat.com>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	benve@cisco.com
Subject: Re: [Qemu-devel] VFIO v2 design plan
Date: Fri, 2 Sep 2011 15:07:48 +1000	[thread overview]
Message-ID: <20110902050748.GZ11906@yookeroo.fritz.box> (raw)
In-Reply-To: <1314908821.2859.531.camel@bling.home>

On Thu, Sep 01, 2011 at 02:27:00PM -0600, Alex Williamson wrote:
> On Thu, 2011-09-01 at 14:10 +1000, David Gibson wrote:
> > On Tue, Aug 30, 2011 at 08:51:38AM -0600, Alex Williamson wrote:
[snip]
> > > > If you open a group, merge in a bunch of other groups, then re-open
> > > > /dev/vfio/NNN for one of the groups mergeed, presumably the new fd
> > > > must also see the merged group?  So presumably you must only unmerge
> > > > everything when all the fds are closed.
> > > 
> > > The device fds for the group to be unmerged must be closed before an
> > > unmerge.  The iommu fd is tricky.  The iommu fd is really the iommu for
> > > the merged group, not the individual groups, so it's context stays with
> > > the remaining group.  Therefore I don't enforce a refcnt on the iommu
> > > fd.  The usage model I expect is that if a merge works, the user will
> > > probably use a single iommu fd for the whole merged group.  Maybe that
> > > should be enforced?
> > 
> > I thought I recalled you saying earlier that the iommu fd could not be
> > open when merging new groups in.  I would expect that also to be true
> > when unmerging in that case.
> 
> We have to support hotplug.  The group-to-be-merged can't be in use (no
> open device or iommu fds).  To unmerge a group, we only require that no
> device fds are in use as the merged-group-iommu may still be in use by
> the remaining members.

I'm not entirely clear how this relates to hotplug.  But I guess the
crucial point is that the group-to-be-merged may not have open device
or iommu fds, but the group-to-be-merged-into can?

But couldn't either a merge or an unmerge cause a change in the
effective capabilities of the IOMMU?

> > > > If you open groups a and b, then merge a (disjoint) bunch of things
> > > > into each, then merge b into a, what are the semantics?  Wheat about
> > > > if you then unmerge b from a - does it just remove the atomic group b,
> > > > or everything you merged into b earlier?  Or, what happens if you open
> > > > group a, merge in some things, then attempt to unmerge a from the
> > > > merged group?
> > > 
> > > Simple, don't allow merging and unmerging of merged groups.  Merge and
> > > unmerge only work on singleton groups.
> > 
> > Ok, in that case I think we should call it "add" and "remove" rather
> > than merge and unmerge.
> > 
> > >  The last case we must support.
> > > In that case you just use:
> > > 
> > > ioctl(a.fd, VFIO_GROUP_MERGE, b.fd)
> > > ioctl(b.fd, VFIO_GROUP_UNMERGE, a.fd)
> > > 
> > > The groups are peers when merged, so b can remove a as easily as a can
> > > remove b.  Group b is left with any iommu context setup while
> > > merged.
> > 
> > Um *goes cross-eyed*.  So, if you open (atomic) groups a and b, then
> > add group b to a, are the two open fds now functionally identical?
> 
> Yes.
> 
> > And likewise if you then open either a or b again straight from from
> > /dev/vfio?
> 
> Yes.
> 
> > Except, that the b fd must then retain the fact that it was originally
> > for atomic group (b), so that it can be used as a handle for an
> > unmerge/remove.
> 
> Right.

Ugh.  Having the file handle represent the meta-group for most
purposes, but also represent (invisibly) an atomic group is just
horrible.  Especially when - using one of the examples mentioned
above, it's actually possible to remove the atomic group represented
by an fd from the meta-group it's also representing.

> > The more I dig into the details of these semantics the more I dislike
> > them.
> 
> Suggest something better.  I spent half a day thinking about what vfio
> would look like in configfs, it has some very appealing aspects, but
> since it doesn't support ioctls we'd still have a chardev interface and
> it gets ugly again.

Well, again, I prefer a persistent group interface, where the
meta-group is not bound to the lifetime of a file handle.  Instead you
use a different interface to create a meta-group (which has an ID
disjoint from the atomic groups), then you can open
/dev/vfio/<metagroup-id>.  The constituent atomic group devices are
still visible, and their enumeration interface works, but are otherwise
unusable (like a group which still has kernel drivers bound to some
constituent devices).


Hrm.  In the interests of making forward progress here, can I suggest
we implement the other APIs without group-binding/metagrouping for
now.  It doesn't look as if any of the suggested approaches for this
so far are fundamentally incompatible with the rest of the interface.

[snip]
> > > > > > I'd be more comfortable with a model where there was a distinction
> > > > > > between a "soft" and "hard" remove.  The soft would either simply
> > > > > > fail, if the device is in use by vfio, or block indefinitely.  The
> > > > > > hard would kill the user process without delay.  This effectively
> > > > > > allows your semantics to be implemented in userspace (soft remove,
> > > > > > wait, hard remove) - where it's easier to tweak the policy of how long
> > > > > > to wait.
> > > > > 
> > > > > Your first example is essentially what current vfio does now, request
> > > > > remove, wait indefinitely and qemu triggers an abort if the guest
> > > > > doesn't respond.  The trouble with moving this policy to userspace is
> > > > > that we're not protecting the host.
> > > > 
> > > > How is the host not protected?  Bear in mind that when I say
> > > > "userspace" I'm not thinking qemu, I'm thinking the admin equipped
> > > > with whatever tools he uses for moving devices between guests.  So
> > > > they go:
> > > > 	- Please remove this group from the guest
> > > > 	- Waits for an amount of time of their choice
> > > > 	- Decide, crap, the guest is broken
> > > > 	- Hard remove the group from the guest, killing the guest
> > > > 
> > > > It's basic in perfect analogy to the old:
> > > > 	- kill -15
> > > > 	- *drum fingers*
> > > > 	- Damn, it's stuck
> > > > 	- kill -9
> > > 
> > > And what if the remove is initiated by a hardware admin that walks over
> > > to the system, and presses the PCI device hot unplug doorbell?  It just
> > > looks like a driver hang.  Thanks,
> > 
> > Hm, true.  How is this case handled on the host side?
> 
> Same as if you attempt to unbind the device from the driver, release
> callback, iirc.  Thanks,

Roght, I guess my point is whether there's some kind of userspace
notification or not.  If there is, then it's reasonable to do a soft
unbind, and the userspace callback can do a hard kill after a delay.
If not, then it does need to be a hard unbind / kill.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

     prev parent reply	other threads:[~2011-09-02  5:08 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-26 17:05 [Qemu-devel] VFIO v2 design plan Alex Williamson
2011-08-30  3:04 ` David Gibson
2011-08-30  4:24   ` Alex Williamson
2011-08-30  7:48     ` David Gibson
2011-08-30 14:51       ` Alex Williamson
2011-09-01  4:10         ` David Gibson
2011-09-01 20:27           ` Alex Williamson
2011-09-02  5:07             ` David Gibson [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110902050748.GZ11906@yookeroo.fritz.box \
    --to=dwg@au1.ibm.com \
    --cc=Joerg.Roedel@amd.com \
    --cc=aafabbri@cisco.com \
    --cc=agraf@suse.de \
    --cc=aik@au1.ibm.com \
    --cc=alex.williamson@redhat.com \
    --cc=avi@redhat.com \
    --cc=benve@cisco.com \
    --cc=chrisw@sous-sol.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=kvm@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=pmac@au1.ibm.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).