Re: [PATCH 1/4] iommu: Add iommu_device_group callback and iommu_group sysfs entry

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Benjamin Herrenschmidt <benh@kernel.crashing.org>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: David Gibson <dwg@au1.ibm.com>,
	joerg.roedel@amd.com, dwmw2@infradead.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	chrisw@redhat.com, agraf@suse.de, scottwood@freescale.com,
	B08248@freescale.com
Subject: Re: [PATCH 1/4] iommu: Add iommu_device_group callback and iommu_group sysfs entry
Date: Wed, 30 Nov 2011 20:23:48 +1100	[thread overview]
Message-ID: <1322645028.21641.71.camel@pasglop> (raw)
In-Reply-To: <1322630751.19120.222.camel@bling.home>

On Tue, 2011-11-29 at 22:25 -0700, Alex Williamson wrote:

> Note that iommu drivers are registered per bus_type, so the unique pair
> is {bus_type, groupid}, which seems sufficient for vfio.
> 
> > Don't forget that to keep sanity, we really want to expose the groups
> > via sysfs (per-group dir with symlinks to the devices).
> > 
> > I'm working with Alexey on providing an in-kernel powerpc specific API
> > to expose the PE stuff to whatever's going to interface to VFIO to
> > create the groups, though we can eventually collapse that. The idea is
> > that on non-PE capable brigdes (old style), I would make a single group
> > per host bridge.
> 
> If your non-PE capable bridges aren't actually providing isolation, they
> should return -ENODEV for the group_device() callback, then vfio will
> ignore them.

Why ignore them ? It's perfectly fine to consider everything below the
host bridge as one group. There is isolation ... at the host bridge
level.

Really groups should be a structure, not a magic number. We want to
iterate them and their content, represent them via an API, etc... and so
magic numbers means that anything under the hood will have to constantly
convert between that and some kind of internal data structure.

I also somewhat dislike the bus_type as the anchor to the grouping
system, but that's not necessarily as bad an issue for us to deal with.

Eventually what will happen on my side is that I will have a powerpc
"generic" (ie. accross platforms) that allow to enumerate groups and
retrieve the dma windows associated with them etc...

That API will use underlying function pointers provided by the PCI host
bridge (for which we do have a data structure, struct pci_controller,
like many other archs except I think x86 :-)

Any host platform that doesn't provide those pointers (ie. all of them
initially) will get a default behaviour which is to group everything
below a host bridge (since host bridges still have independent iommu
windows, at least for us they all do). 

On top of that we can implement a "backend" that provides those pointers
for the p7ioc bridge used on the powernv platform, which will expose
more fine grained groups based on our "partitionable endpoint"
mechanism.

The grouping will have been decided early at boot time based on a mix of
HW resources and bus topology, plus things like whether there is a PCI-X
bridge etc... and will be initially immutable.

Ideally, we need to expose a subset of this API as a "generic" interface
to allow generic code to iterate the groups and their content, and to
construct the appropriate sysfs representation.

> > In addition, Alex, I noticed that you still have the domain stuff there,
> > which is fine I suppose, we could make it a requirement on power that
> > you only put a single group in a domain... but the API is still to put
> > individual devices in a domain, not groups, and that somewhat sucks.
> > 
> > You could "fix" that by having some kind of ->domain_enable() or
> > whatever that's used to "activate" the domain and verifies that it
> > contains entire groups but that looks like a pointless way to complicate
> > both the API and the implementation.
> 
> Right, groups are currently just a way to identify dependent sets, not a
> unit of work.  We can also have group membership change dynamically
> (hotplug slot behind a PCIe-to-PCI bridge), so there are cases where we
> might need to formally attach/detach a group element to a domain at some
> later point.  This really hasn't felt like a stumbling point for vfio,
> at least on x86.  Thanks,

It doesn't matter much as long as we have a way to know that a group is
"complete", ie that all devices of a group have been taken over by vfio
and put into a domain, and block them from being lost. Only then can we
actually "use" the group and start reconfiguring the iommu etc... for
use by the guest.

Note that groups -will- contain briges eventually. We need to take that
into account since bridges -usually- don't have an ordinary driver
attached to them so there may be issues there with tracking whether they
are taken over by vfio...

Cheers,
Ben.

next prev parent reply	other threads:[~2011-11-30  9:24 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-10-21 19:55 [PATCH 0/4] iommu: iommu_ops group interface Alex Williamson
2011-10-21 19:56 ` [PATCH 1/4] iommu: Add iommu_device_group callback and iommu_group sysfs entry Alex Williamson
2011-11-30  2:42   ` David Gibson
2011-11-30  4:51     ` Benjamin Herrenschmidt
2011-11-30  5:25       ` Alex Williamson
2011-11-30  9:23         ` Benjamin Herrenschmidt [this message]
2011-12-01  0:06           ` David Gibson
2011-12-01  6:20           ` Alex Williamson
2011-12-01  0:03         ` David Gibson
2011-12-01  0:52           ` Chris Wright
2011-12-01  0:57             ` David Gibson
2011-12-01  1:04               ` Chris Wright
2011-12-01  1:50                 ` Benjamin Herrenschmidt
2011-12-01  2:00                   ` David Gibson
2011-12-01  2:05                   ` Chris Wright
2011-12-01  7:28                     ` Alex Williamson
2011-12-01 14:02                     ` Yoder Stuart-B08248
2011-12-01  6:48           ` Alex Williamson
2011-12-01 10:33             ` David Woodhouse
2011-12-01 14:34               ` Alex Williamson
2011-12-01 21:46                 ` Benjamin Herrenschmidt
2011-12-01 22:37                   ` Alex Williamson
2011-12-01 23:14                 ` David Woodhouse
2011-12-07  6:20                   ` Benjamin Herrenschmidt
2011-12-01 21:32             ` Benjamin Herrenschmidt
2011-10-21 19:56 ` [PATCH 2/4] intel-iommu: Implement iommu_device_group Alex Williamson
2011-11-08 17:23   ` Roedel, Joerg
2011-11-10 15:22     ` David Woodhouse
2011-10-21 19:56 ` [PATCH 3/4] amd-iommu: " Alex Williamson
2011-10-21 19:56 ` [PATCH 4/4] iommu: Add option to group multi-function devices Alex Williamson
2011-12-01  0:11   ` David Gibson
2011-10-21 20:34 ` [PATCH 0/4] iommu: iommu_ops group interface Woodhouse, David
2011-10-21 21:16   ` Alex Williamson
2011-10-21 22:39     ` Woodhouse, David
2011-10-21 22:34   ` Alex Williamson
2011-10-27 16:31 ` Alex Williamson
2011-11-15 15:51 ` Roedel, Joerg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1322645028.21641.71.camel@pasglop \
    --to=benh@kernel.crashing.org \
    --cc=B08248@freescale.com \
    --cc=agraf@suse.de \
    --cc=alex.williamson@redhat.com \
    --cc=chrisw@redhat.com \
    --cc=dwg@au1.ibm.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joerg.roedel@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox