Re: [RFC] Device isolation infrastructure v2

All of lore.kernel.org
 help / color / mirror / Atom feed

From: David Gibson <david@gibson.dropbear.id.au>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: aik@ozlabs.ru, benh@kernel.crashing.org, joerg.roedel@amd.com,
	dwmw2@infradead.org, chrisw@redhat.com, agraf@suse.de,
	scottwood@freescale.com, B08248@freescale.com,
	rusty@rustcorp.com.au, iommu@lists.linux-foundation.org,
	qemu-devel@nongnu.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC] Device isolation infrastructure v2
Date: Fri, 16 Dec 2011 12:40:26 +1100	[thread overview]
Message-ID: <20111216014026.GA4921@truffala.fritz.box> (raw)
In-Reply-To: <1323972307.2437.19.camel@x201.home>

On Thu, Dec 15, 2011 at 11:05:07AM -0700, Alex Williamson wrote:
> On Thu, 2011-12-15 at 17:25 +1100, David Gibson wrote:
> > Here's the second spin of my preferred approach to handling grouping
> > of devices for safe assignment to guests.
> > 
> > Changes since v1:
> >  * Many name changes and file moves for improved consistency
> >  * Bugfixes and cleanups
> >  * The interface to the next layer up is considerably fleshed out,
> >    although it still needs work.
> >  * Example initialization of groups for p5ioc2 and p7ioc.
> > 
> > TODO:
> >  * Need sample initialization of groups for intel and/or amd iommus
> 
> I think this very well might imposed significant bloat for those
> implementations.  On POWER you typically don't have singleton groups,
> while it's the norm on x86.  I don't know that either intel or amd iommu
> code have existing structures that they can simply tack the group
> pointer to.

Actually, I think they can probably just use the group pointer in the
struct device.  Each PCI function will typically allocate a new group
and put the pointer in the struct device and no-where else.  Devices
hidden under bridges copy the pointer from the bridge parent instead.
I will have to check the unplug path to ensure we can manage the group
lifetime properly, of course.

>  Again, this is one of the reasons that I think the current
> vfio implementation is the right starting point.  We keep groups within
> vfio, imposing zero overhead for systems not making use of it and only
> require iommu drivers to implement a trivial function to opt-in.  As we
> start to make groups more pervasive in the dma layer, independent of
> userspace driver exposure, we can offload pieces to the core.  Starting
> with it in the core and hand waving some future use that we don't plan
> to implement right now seems like the wrong direction.

Well, I think we must agree to disagree here; I think treating groups
as identifiable objects is worthwhile.  That said, I am looking for
ways to whittle down the overhead when they're not in use.

> >  * Use of sysfs attributes to control group permission is probably a
> >    mistake.  Although it seems a bit odd, registering a chardev for
> >    each group is probably better, because perms can be set from udev
> >    rules, just like everything else.
> 
> I agree, this is a horrible mistake.  Reinventing file permissions via
> sysfs is bound to be broken and doesn't account for selinux permissions.
> Again, I know you don't like aspects of the vfio group management, but
> it gets this right imho.

Yeah.  I came up with this because I was trying to avoid registering a
device whose only purpose was to act as a permissioned "handle" on the
group.  But it is a better approach, despite that.  I just wanted to
send out the new patches for comment without waiting to do that
rework.

> >  * Need more details of what the binder structure will need to
> >    contain.
> >  * Handle complete removal of groups.
> >  * Clarify what will need to happen on the hot unplug path.
> 
> We're still also removing devices from the driver model, this means
> drivers like vfio need to re-implement a lot of the driver core for
> driving each individual device in the group,

Ah, so, that's a bit I've yet to flesh out.  The subtle distinction is
that we prevent driver _matching_ not driver _binding.  It's
intentionally still possible to explicitly bind drivers to devices in
the group, by bypassing the automatic match mechanism.

I'm intending that when a group is bound, the binder struct will
(optionally) specify a driver (or several, for different bus types)
which will be "force bound" to all the devices in the group.

> and as you indicate, it's
> unclear what happens in the hotplug path

It's clear enough in concept.  I just have to work out exactly where
we need to call d_i_dev_remove(), and whether we'll need some sort of
callback to the bridge / iommu code.

> and I wonder if things like
> suspend/resume will also require non-standard support.  I really prefer
> attaching individual bus drivers to devices using the standard
> bind/unbind mechanisms.  I have a hard time seeing how this is an
> improvement from vfio.  Thanks,
> 
> Alex
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

WARNING: multiple messages have this Message-ID (diff)

From: David Gibson <david@gibson.dropbear.id.au>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: chrisw@redhat.com, aik@ozlabs.ru, rusty@rustcorp.com.au,
	agraf@suse.de, qemu-devel@nongnu.org, B08248@freescale.com,
	iommu@lists.linux-foundation.org, joerg.roedel@amd.com,
	scottwood@freescale.com, dwmw2@infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [Qemu-devel] [RFC] Device isolation infrastructure v2
Date: Fri, 16 Dec 2011 12:40:26 +1100	[thread overview]
Message-ID: <20111216014026.GA4921@truffala.fritz.box> (raw)
In-Reply-To: <1323972307.2437.19.camel@x201.home>

On Thu, Dec 15, 2011 at 11:05:07AM -0700, Alex Williamson wrote:
> On Thu, 2011-12-15 at 17:25 +1100, David Gibson wrote:
> > Here's the second spin of my preferred approach to handling grouping
> > of devices for safe assignment to guests.
> > 
> > Changes since v1:
> >  * Many name changes and file moves for improved consistency
> >  * Bugfixes and cleanups
> >  * The interface to the next layer up is considerably fleshed out,
> >    although it still needs work.
> >  * Example initialization of groups for p5ioc2 and p7ioc.
> > 
> > TODO:
> >  * Need sample initialization of groups for intel and/or amd iommus
> 
> I think this very well might imposed significant bloat for those
> implementations.  On POWER you typically don't have singleton groups,
> while it's the norm on x86.  I don't know that either intel or amd iommu
> code have existing structures that they can simply tack the group
> pointer to.

Actually, I think they can probably just use the group pointer in the
struct device.  Each PCI function will typically allocate a new group
and put the pointer in the struct device and no-where else.  Devices
hidden under bridges copy the pointer from the bridge parent instead.
I will have to check the unplug path to ensure we can manage the group
lifetime properly, of course.

>  Again, this is one of the reasons that I think the current
> vfio implementation is the right starting point.  We keep groups within
> vfio, imposing zero overhead for systems not making use of it and only
> require iommu drivers to implement a trivial function to opt-in.  As we
> start to make groups more pervasive in the dma layer, independent of
> userspace driver exposure, we can offload pieces to the core.  Starting
> with it in the core and hand waving some future use that we don't plan
> to implement right now seems like the wrong direction.

Well, I think we must agree to disagree here; I think treating groups
as identifiable objects is worthwhile.  That said, I am looking for
ways to whittle down the overhead when they're not in use.

> >  * Use of sysfs attributes to control group permission is probably a
> >    mistake.  Although it seems a bit odd, registering a chardev for
> >    each group is probably better, because perms can be set from udev
> >    rules, just like everything else.
> 
> I agree, this is a horrible mistake.  Reinventing file permissions via
> sysfs is bound to be broken and doesn't account for selinux permissions.
> Again, I know you don't like aspects of the vfio group management, but
> it gets this right imho.

Yeah.  I came up with this because I was trying to avoid registering a
device whose only purpose was to act as a permissioned "handle" on the
group.  But it is a better approach, despite that.  I just wanted to
send out the new patches for comment without waiting to do that
rework.

> >  * Need more details of what the binder structure will need to
> >    contain.
> >  * Handle complete removal of groups.
> >  * Clarify what will need to happen on the hot unplug path.
> 
> We're still also removing devices from the driver model, this means
> drivers like vfio need to re-implement a lot of the driver core for
> driving each individual device in the group,

Ah, so, that's a bit I've yet to flesh out.  The subtle distinction is
that we prevent driver _matching_ not driver _binding.  It's
intentionally still possible to explicitly bind drivers to devices in
the group, by bypassing the automatic match mechanism.

I'm intending that when a group is bound, the binder struct will
(optionally) specify a driver (or several, for different bus types)
which will be "force bound" to all the devices in the group.

> and as you indicate, it's
> unclear what happens in the hotplug path

It's clear enough in concept.  I just have to work out exactly where
we need to call d_i_dev_remove(), and whether we'll need some sort of
callback to the bridge / iommu code.

> and I wonder if things like
> suspend/resume will also require non-standard support.  I really prefer
> attaching individual bus drivers to devices using the standard
> bind/unbind mechanisms.  I have a hard time seeing how this is an
> improvement from vfio.  Thanks,
> 
> Alex
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

next prev parent reply	other threads:[~2011-12-16  1:40 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-12-15  6:25 [RFC] Device isolation infrastructure v2 David Gibson
2011-12-15  6:25 ` [Qemu-devel] " David Gibson
2011-12-15  6:25 ` [PATCH 1/3] device_isolation: Infrastructure for managing device isolation groups David Gibson
2011-12-15  6:25   ` [Qemu-devel] " David Gibson
2011-12-15  6:25 ` [PATCH 2/3] device_isolation: Support isolation on POWER p5ioc2 bridges David Gibson
2011-12-15  6:25   ` [Qemu-devel] " David Gibson
2011-12-15  6:25 ` [PATCH 3/3] device_isolation: Support isolation on POWER p7ioc (IODA) bridges David Gibson
2011-12-15  6:25   ` [Qemu-devel] " David Gibson
2011-12-15 18:05 ` [RFC] Device isolation infrastructure v2 Alex Williamson
2011-12-15 18:05   ` [Qemu-devel] " Alex Williamson
2011-12-15 22:39   ` Alex Williamson
2011-12-15 22:39     ` [Qemu-devel] " Alex Williamson
2011-12-16  1:40   ` David Gibson [this message]
2011-12-16  1:40     ` David Gibson
2011-12-16  4:49     ` Alex Williamson
2011-12-16  4:49       ` [Qemu-devel] " Alex Williamson
2011-12-16  6:00       ` David Gibson
2011-12-16  6:00         ` [Qemu-devel] " David Gibson
2011-12-16 14:53   ` Joerg Roedel
2011-12-16 14:53     ` [Qemu-devel] " Joerg Roedel
2011-12-19  0:11     ` David Gibson
2011-12-19  0:11       ` [Qemu-devel] " David Gibson
2011-12-19 15:41       ` Joerg Roedel
2011-12-19 15:41         ` [Qemu-devel] " Joerg Roedel
2011-12-21  3:32         ` David Gibson
2011-12-21  3:32           ` David Gibson
2011-12-21  4:30           ` Alex Williamson
2011-12-21  4:30             ` Alex Williamson
2011-12-21  6:12             ` Aaron Fabbri
2011-12-21  6:12               ` Aaron Fabbri
     [not found]             ` <1324441837.29699.38.camel-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
2012-01-25  3:13               ` David Gibson
2012-01-25  3:13                 ` David Gibson
2012-01-25  3:13                 ` David Gibson
2012-01-25 23:44                 ` Alex Williamson
2012-01-25 23:44                   ` [Qemu-devel] " Alex Williamson
2012-01-25 23:44                   ` Alex Williamson
     [not found]                   ` <1327535093.26484.190.camel-xdHQ/5r00wBBDLzU/O5InQ@public.gmane.org>
2012-01-30 23:22                     ` David Gibson
2012-01-30 23:22                       ` David Gibson
2012-01-30 23:22                       ` David Gibson
2011-12-21 16:46           ` Joerg Roedel
2011-12-21 16:46             ` Joerg Roedel
2011-12-19 15:46       ` David Woodhouse
2011-12-19 15:46         ` [Qemu-devel] " David Woodhouse
2011-12-19 22:31         ` David Gibson
2011-12-19 22:31           ` [Qemu-devel] " David Gibson
2011-12-19 22:56           ` David Woodhouse
2011-12-19 22:56             ` [Qemu-devel] " David Woodhouse
2011-12-20  0:25             ` David Gibson
2011-12-20  0:25               ` [Qemu-devel] " David Gibson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111216014026.GA4921@truffala.fritz.box \
    --to=david@gibson.dropbear.id.au \
    --cc=B08248@freescale.com \
    --cc=agraf@suse.de \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=benh@kernel.crashing.org \
    --cc=chrisw@redhat.com \
    --cc=dwmw2@infradead.org \
    --cc=iommu@lists.linux-foundation.org \
    --cc=joerg.roedel@amd.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=qemu-devel@nongnu.org \
    --cc=rusty@rustcorp.com.au \
    --cc=scottwood@freescale.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.