* Re: kvm PCI assignment & VFIO ramblings
From: Alex Williamson @ 2011-08-22 19:17 UTC (permalink / raw)
To: Joerg Roedel
Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
qemu-devel, chrisw, iommu, Avi Kivity, Anthony Liguori,
linux-pci@vger.kernel.org, linuxppc-dev, benve@cisco.com
In-Reply-To: <20110822172508.GJ2079@amd.com>
On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote:
> On Sat, Aug 20, 2011 at 12:51:39PM -0400, Alex Williamson wrote:
> > We had an extremely productive VFIO BoF on Monday. Here's my attempt to
> > capture the plan that I think we agreed to:
> >
> > We need to address both the description and enforcement of device
> > groups. Groups are formed any time the iommu does not have resolution
> > between a set of devices. On x86, this typically happens when a
> > PCI-to-PCI bridge exists between the set of devices and the iommu. For
> > Power, partitionable endpoints define a group. Grouping information
> > needs to be exposed for both userspace and kernel internal usage. This
> > will be a sysfs attribute setup by the iommu drivers. Perhaps:
> >
> > # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> > 42
>
> Right, that is mainly for libvirt to provide that information to the
> user in a meaningful way. So userspace is aware that other devices might
> not work anymore when it assigns one to a guest.
>
> >
> > (I use a PCI example here, but attribute should not be PCI specific)
> >
> > From there we have a few options. In the BoF we discussed a model where
> > binding a device to vfio creates a /dev/vfio$GROUP character device
> > file. This "group" fd provides provides dma mapping ioctls as well as
> > ioctls to enumerate and return a "device" fd for each attached member of
> > the group (similar to KVM_CREATE_VCPU). We enforce grouping by
> > returning an error on open() of the group fd if there are members of the
> > group not bound to the vfio driver. Each device fd would then support a
> > similar set of ioctls and mapping (mmio/pio/config) interface as current
> > vfio, except for the obvious domain and dma ioctls superseded by the
> > group fd.
> >
> > Another valid model might be that /dev/vfio/$GROUP is created for all
> > groups when the vfio module is loaded. The group fd would allow open()
> > and some set of iommu querying and device enumeration ioctls, but would
> > error on dma mapping and retrieving device fds until all of the group
> > devices are bound to the vfio driver.
>
> I am in favour of /dev/vfio/$GROUP. If multiple devices should be
> assigned to a guest, there can also be an ioctl to bind a group to an
> address-space of another group (certainly needs some care to not allow
> that both groups belong to different processes).
That's an interesting idea. Maybe an interface similar to the current
uiommu interface, where you open() the 2nd group fd and pass the fd via
ioctl to the primary group. IOMMUs that don't support this would fail
the attach device callback, which would fail the ioctl to bind them. It
will need to be designed so any group can be removed from the super-set
and the remaining group(s) still works. This feels like something that
can be added after we get an initial implementation.
> Btw, a problem we havn't talked about yet entirely is
> driver-deassignment. User space can decide to de-assign the device from
> vfio while a fd is open on it. With PCI there is no way to let this fail
> (the .release function returns void last time i checked). Is this a
> problem, and yes, how we handle that?
The current vfio has the same problem, we can't unbind a device from
vfio while it's attached to a guest. I think we'd use the same solution
too; send out a netlink packet for a device removal and have the .remove
call sleep on a wait_event(, refcnt == 0). We could also set a timeout
and SIGBUS the PIDs holding the device if they don't return it
willingly. Thanks,
Alex
^ permalink raw reply
* Re: [PATCH] RapidIO: Fix use of non-compatible registers
From: Andrew Morton @ 2011-08-22 19:28 UTC (permalink / raw)
To: Alexandre Bounine
Cc: Chul Kim, linux-kernel, Thomas Moll, linuxppc-dev, stable
In-Reply-To: <1311703646-3453-1-git-send-email-alexandre.bounine@idt.com>
On Tue, 26 Jul 2011 14:07:26 -0400
Alexandre Bounine <alexandre.bounine@idt.com> wrote:
> Replace/remove use of RIO v.1.2 registers/bits that are not forward-compatible
> with newer versions of RapidIO specification.
>
> RapidIO specification v. 1.3 removed Write Port CSR, Doorbell CSR,
> Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.
>
> Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
> Cc: Kumar Gala <galak@kernel.crashing.org>
> Cc: Matt Porter <mporter@kernel.crashing.org>
> Cc: Li Yang <leoli@freescale.com>
> Cc: Thomas Moll <thomas.moll@sysgo.com>
> Cc: Chul Kim <chul.kim@idt.com>
> Cc: <stable@kernel.org>
You did a cc:stable but provided no reason (that I can understand) for
backporting the patch. Please explain why the problem is sufficiently
serious to warrant this action.
^ permalink raw reply
* Re: [PATCH] [v2] sound/soc/fsl/fsl_dma.c: add missing of_node_put
From: Liam Girdwood @ 2011-08-22 20:05 UTC (permalink / raw)
To: Timur Tabi
Cc: alsa-devel@alsa-project.org, tiwai@suse.de,
devicetree-discuss@lists.ozlabs.org,
broonie@opensource.wolfsonmicro.com,
kernel-janitors@vger.kernel.org, linux-kernel@vger.kernel.org,
perex@perex.cz, julia@diku.dk, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1314022961-27513-1-git-send-email-timur@freescale.com>
On 22/08/11 15:22, Timur Tabi wrote:
> of_parse_phandle increments the reference count of np, so this should be
> decremented before trying the next possibility.
>
> Since we don't actually use np, we can decrement the reference count
> immediately.
>
> Reported-by: Julia Lawall <julia@diku.dk>
> Signed-off-by: Timur Tabi <timur@freescale.com>
Acked-by: Liam Girdwood <lrg@ti.com>
> ---
> sound/soc/fsl/fsl_dma.c | 2 ++
> 1 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/sound/soc/fsl/fsl_dma.c b/sound/soc/fsl/fsl_dma.c
> index 6680c0b..b300f4b 100644
> --- a/sound/soc/fsl/fsl_dma.c
> +++ b/sound/soc/fsl/fsl_dma.c
> @@ -877,10 +877,12 @@ static struct device_node *find_ssi_node(struct device_node *dma_channel_np)
> * assume that device_node pointers are a valid comparison.
> */
> np = of_parse_phandle(ssi_np, "fsl,playback-dma", 0);
> + of_node_put(np);
> if (np == dma_channel_np)
> return ssi_np;
>
> np = of_parse_phandle(ssi_np, "fsl,capture-dma", 0);
> + of_node_put(np);
> if (np == dma_channel_np)
> return ssi_np;
> }
^ permalink raw reply
* Re: kvm PCI assignment & VFIO ramblings
From: aafabbri @ 2011-08-22 20:29 UTC (permalink / raw)
To: Alex Williamson, Benjamin Herrenschmidt
Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
linux-pci@vger.kernel.org, qemu-devel, chrisw, iommu, Avi Kivity,
Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <1313859105.6866.192.camel@x201.home>
On 8/20/11 9:51 AM, "Alex Williamson" <alex.williamson@redhat.com> wrote:
> We had an extremely productive VFIO BoF on Monday. Here's my attempt to
> capture the plan that I think we agreed to:
>
> We need to address both the description and enforcement of device
> groups. Groups are formed any time the iommu does not have resolution
> between a set of devices. On x86, this typically happens when a
> PCI-to-PCI bridge exists between the set of devices and the iommu. For
> Power, partitionable endpoints define a group. Grouping information
> needs to be exposed for both userspace and kernel internal usage. This
> will be a sysfs attribute setup by the iommu drivers. Perhaps:
>
> # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> 42
>
> (I use a PCI example here, but attribute should not be PCI specific)
>
> From there we have a few options. In the BoF we discussed a model where
> binding a device to vfio creates a /dev/vfio$GROUP character device
> file. This "group" fd provides provides dma mapping ioctls as well as
> ioctls to enumerate and return a "device" fd for each attached member of
> the group (similar to KVM_CREATE_VCPU). We enforce grouping by
> returning an error on open() of the group fd if there are members of the
> group not bound to the vfio driver.
Sounds reasonable.
> Each device fd would then support a
> similar set of ioctls and mapping (mmio/pio/config) interface as current
> vfio, except for the obvious domain and dma ioctls superseded by the
> group fd.
>
> Another valid model might be that /dev/vfio/$GROUP is created for all
> groups when the vfio module is loaded. The group fd would allow open()
> and some set of iommu querying and device enumeration ioctls, but would
> error on dma mapping and retrieving device fds until all of the group
> devices are bound to the vfio driver.
>
> In either case, the uiommu interface is removed entirely since dma
> mapping is done via the group fd.
The loss in generality is unfortunate. I'd like to be able to support
arbitrary iommu domain <-> device assignment. One way to do this would be
to keep uiommu, but to return an error if someone tries to assign more than
one uiommu context to devices in the same group.
-Aaron
> As necessary in the future, we can
> define a more high performance dma mapping interface for streaming dma
> via the group fd. I expect we'll also include architecture specific
> group ioctls to describe features and capabilities of the iommu. The
> group fd will need to prevent concurrent open()s to maintain a 1:1 group
> to userspace process ownership model.
>
> Also on the table is supporting non-PCI devices with vfio. To do this,
> we need to generalize the read/write/mmap and irq eventfd interfaces.
> We could keep the same model of segmenting the device fd address space,
> perhaps adding ioctls to define the segment offset bit position or we
> could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
> VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
> suffering some degree of fd bloat (group fd, device fd(s), interrupt
> event fd(s), per resource fd, etc). For interrupts we can overload
> VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq (do non-PCI
> devices support MSI?).
>
> For qemu, these changes imply we'd only support a model where we have a
> 1:1 group to iommu domain. The current vfio driver could probably
> become vfio-pci as we might end up with more target specific vfio
> drivers for non-pci. PCI should be able to maintain a simple -device
> vfio-pci,host=bb:dd.f to enable hotplug of individual devices. We'll
> need to come up with extra options when we need to expose groups to
> guest for pvdma.
>
> Hope that captures it, feel free to jump in with corrections and
> suggestions. Thanks,
>
> Alex
>
^ permalink raw reply
* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-22 20:49 UTC (permalink / raw)
To: aafabbri
Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <CA780A2A.FB0B%aafabbri@cisco.com>
On Mon, 2011-08-22 at 13:29 -0700, aafabbri wrote:
> > Each device fd would then support a
> > similar set of ioctls and mapping (mmio/pio/config) interface as current
> > vfio, except for the obvious domain and dma ioctls superseded by the
> > group fd.
> >
> > Another valid model might be that /dev/vfio/$GROUP is created for all
> > groups when the vfio module is loaded. The group fd would allow open()
> > and some set of iommu querying and device enumeration ioctls, but would
> > error on dma mapping and retrieving device fds until all of the group
> > devices are bound to the vfio driver.
> >
> > In either case, the uiommu interface is removed entirely since dma
> > mapping is done via the group fd.
>
> The loss in generality is unfortunate. I'd like to be able to support
> arbitrary iommu domain <-> device assignment. One way to do this would be
> to keep uiommu, but to return an error if someone tries to assign more than
> one uiommu context to devices in the same group.
I wouldn't use uiommu for that. If the HW or underlying kernel drivers
support it, what I'd suggest is that you have an (optional) ioctl to
bind two groups (you have to have both opened already) or for one group
to "capture" another one.
The binding means under the hood the iommus get shared, with the
lifetime being that of the "owning" group.
Another option is to make that static configuration APIs via special
ioctls (or even netlink if you really like it), to change the grouping
on architectures that allow it.
Cheers.
Ben.
>
> -Aaron
>
> > As necessary in the future, we can
> > define a more high performance dma mapping interface for streaming dma
> > via the group fd. I expect we'll also include architecture specific
> > group ioctls to describe features and capabilities of the iommu. The
> > group fd will need to prevent concurrent open()s to maintain a 1:1 group
> > to userspace process ownership model.
> >
> > Also on the table is supporting non-PCI devices with vfio. To do this,
> > we need to generalize the read/write/mmap and irq eventfd interfaces.
> > We could keep the same model of segmenting the device fd address space,
> > perhaps adding ioctls to define the segment offset bit position or we
> > could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
> > VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
> > suffering some degree of fd bloat (group fd, device fd(s), interrupt
> > event fd(s), per resource fd, etc). For interrupts we can overload
> > VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq (do non-PCI
> > devices support MSI?).
> >
> > For qemu, these changes imply we'd only support a model where we have a
> > 1:1 group to iommu domain. The current vfio driver could probably
> > become vfio-pci as we might end up with more target specific vfio
> > drivers for non-pci. PCI should be able to maintain a simple -device
> > vfio-pci,host=bb:dd.f to enable hotplug of individual devices. We'll
> > need to come up with extra options when we need to expose groups to
> > guest for pvdma.
> >
> > Hope that captures it, feel free to jump in with corrections and
> > suggestions. Thanks,
> >
> > Alex
> >
^ permalink raw reply
* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-22 20:53 UTC (permalink / raw)
To: Avi Kivity
Cc: chrisw, Alexey Kardashevskiy, kvm, Paul Mackerras,
linux-pci@vger.kernel.org, qemu-devel, iommu, aafabbri,
Alex Williamson, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <4E51F782.7060005@redhat.com>
On Mon, 2011-08-22 at 09:30 +0300, Avi Kivity wrote:
> On 08/20/2011 07:51 PM, Alex Williamson wrote:
> > We need to address both the description and enforcement of device
> > groups. Groups are formed any time the iommu does not have resolution
> > between a set of devices. On x86, this typically happens when a
> > PCI-to-PCI bridge exists between the set of devices and the iommu. For
> > Power, partitionable endpoints define a group. Grouping information
> > needs to be exposed for both userspace and kernel internal usage. This
> > will be a sysfs attribute setup by the iommu drivers. Perhaps:
> >
> > # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> > 42
> >
>
> $ readlink /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> ../../../path/to/device/which/represents/the/resource/constraint
>
> (the pci-to-pci bridge on x86, or whatever node represents partitionable
> endpoints on power)
The constraint might not necessarily be a device.
The PCI bridge is just an example. There are other possible constraints.
On POWER for example, it could be a limit in how far I can segment the
DMA address space, forcing me to arbitrarily put devices together, or it
could be a similar constraint related to how the MMIO space is broken
up.
So either that remains a path in which case we do have a separate set of
sysfs nodes representing the groups themselves which may or may not
itself contain a pointer to the "constraining" device, or we just make
that an arbitrary number (in my case the PE#)
Cheers,
Ben
^ permalink raw reply
* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-22 21:01 UTC (permalink / raw)
To: Alex Williamson
Cc: aafabbri, Alexey Kardashevskiy, kvm, Paul Mackerras,
linux-pci@vger.kernel.org, qemu-devel, David Gibson, chrisw,
iommu, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <1314027950.6866.242.camel@x201.home>
On Mon, 2011-08-22 at 09:45 -0600, Alex Williamson wrote:
> Yes, that's the idea. An open question I have towards the configuration
> side is whether we might add iommu driver specific options to the
> groups. For instance on x86 where we typically have B:D.F granularity,
> should we have an option not to trust multi-function devices and use a
> B:D granularity for grouping?
Or even B or range of busses... if you want to enforce strict isolation
you really can't trust anything below a bus level :-)
> Right, we can also combine models. Binding a device to vfio
> creates /dev/vfio$GROUP, which only allows a subset of ioctls and no
> device access until all the group devices are also bound. I think
> the /dev/vfio/$GROUP might help provide an enumeration interface as well
> though, which could be useful.
Could be tho in what form ? returning sysfs pathes ?
> 1:1 group<->process is probably too strong. Not allowing concurrent
> open()s on the group file enforces a single userspace entity is
> responsible for that group. Device fds can be passed to other
> processes, but only retrieved via the group fd. I suppose we could even
> branch off the dma interface into a different fd, but it seems like we
> would logically want to serialize dma mappings at each iommu group
> anyway. I'm open to alternatives, this just seemed an easy way to do
> it. Restricting on UID implies that we require isolated qemu instances
> to run as different UIDs. I know that's a goal, but I don't know if we
> want to make it an assumption in the group security model.
1:1 process has the advantage of linking to an -mm which makes the whole
mmu notifier business doable. How do you want to track down mappings and
do the second level translation in the case of explicit map/unmap (like
on power) if you are not tied to an mm_struct ?
> Yes. I'm not sure there's a good ROI to prioritize that model. We have
> to assume >1 device per guest is a typical model and that the iotlb is
> large enough that we might improve thrashing to see both a resource and
> performance benefit from it. I'm open to suggestions for how we could
> include it though.
Sharing may or may not be possible depending on setups so yes, it's a
bit tricky.
My preference is to have a static interface (and that's actually where
your pet netlink might make some sense :-) to create "synthetic" groups
made of other groups if the arch allows it. But that might not be the
best approach. In another email I also proposed an option for a group to
"capture" another one...
> > If that's
> > not what you're saying, how would the domains - now made up of a
> > user's selection of groups, rather than individual devices - be
> > configured?
> >
> > > Hope that captures it, feel free to jump in with corrections and
> > > suggestions. Thanks,
> >
Another aspect I don't see discussed is how we represent these things to
the guest.
On Power for example, I have a requirement that a given iommu domain is
represented by a single dma window property in the device-tree. What
that means is that that property needs to be either in the node of the
device itself if there's only one device in the group or in a parent
node (ie a bridge or host bridge) if there are multiple devices.
Now I do -not- want to go down the path of simulating P2P bridges,
besides we'll quickly run out of bus numbers if we go there.
For us the most simple and logical approach (which is also what pHyp
uses and what Linux handles well) is really to expose a given PCI host
bridge per group to the guest. Believe it or not, it makes things
easier :-)
Cheers,
Ben.
^ permalink raw reply
* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-22 21:03 UTC (permalink / raw)
To: Joerg Roedel
Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev,
benve@cisco.com
In-Reply-To: <20110822172508.GJ2079@amd.com>
> I am in favour of /dev/vfio/$GROUP. If multiple devices should be
> assigned to a guest, there can also be an ioctl to bind a group to an
> address-space of another group (certainly needs some care to not allow
> that both groups belong to different processes).
>
> Btw, a problem we havn't talked about yet entirely is
> driver-deassignment. User space can decide to de-assign the device from
> vfio while a fd is open on it. With PCI there is no way to let this fail
> (the .release function returns void last time i checked). Is this a
> problem, and yes, how we handle that?
We can treat it as a hard unplug (like a cardbus gone away).
IE. Dispose of the direct mappings (switch to MMIO emulation) and return
all ff's from reads (& ignore writes).
Then send an unplug event via whatever mechanism the platform provides
(ACPI hotplug controller on x86 for example, we haven't quite sorted out
what to do on power for hotplug yet).
Cheers,
Ben.
^ permalink raw reply
* Re: kvm PCI assignment & VFIO ramblings
From: aafabbri @ 2011-08-22 21:38 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <1314046171.7662.27.camel@pasglop>
On 8/22/11 1:49 PM, "Benjamin Herrenschmidt" <benh@kernel.crashing.org>
wrote:
> On Mon, 2011-08-22 at 13:29 -0700, aafabbri wrote:
>
>>> Each device fd would then support a
>>> similar set of ioctls and mapping (mmio/pio/config) interface as current
>>> vfio, except for the obvious domain and dma ioctls superseded by the
>>> group fd.
>>>
>>> Another valid model might be that /dev/vfio/$GROUP is created for all
>>> groups when the vfio module is loaded. The group fd would allow open()
>>> and some set of iommu querying and device enumeration ioctls, but would
>>> error on dma mapping and retrieving device fds until all of the group
>>> devices are bound to the vfio driver.
>>>
>>> In either case, the uiommu interface is removed entirely since dma
>>> mapping is done via the group fd.
>>
>> The loss in generality is unfortunate. I'd like to be able to support
>> arbitrary iommu domain <-> device assignment. One way to do this would be
>> to keep uiommu, but to return an error if someone tries to assign more than
>> one uiommu context to devices in the same group.
>
> I wouldn't use uiommu for that.
Any particular reason besides saving a file descriptor?
We use it today, and it seems like a cleaner API than what you propose
changing it to.
> If the HW or underlying kernel drivers
> support it, what I'd suggest is that you have an (optional) ioctl to
> bind two groups (you have to have both opened already) or for one group
> to "capture" another one.
You'll need other rules there too.. "both opened already, but zero mappings
performed yet as they would have instantiated a default IOMMU domain".
Keep in mind the only case I'm using is singleton groups, a.k.a. devices.
Since what I want is to specify which devices can do things like share
network buffers (in a way that conserves IOMMU hw resources), it seems
cleanest to expose this explicitly, versus some "inherit iommu domain from
another device" ioctl. What happens if I do something like this:
dev1_fd = open ("/dev/vfio0")
dev2_fd = open ("/dev/vfio1")
dev2_fd.inherit_iommu(dev1_fd)
error = close(dev1_fd)
There are other gross cases as well.
>
> The binding means under the hood the iommus get shared, with the
> lifetime being that of the "owning" group.
So what happens in the close() above? EINUSE? Reset all children? Still
seems less clean than having an explicit iommu fd. Without some benefit I'm
not sure why we'd want to change this API.
If we in singleton-group land were building our own "groups" which were sets
of devices sharing the IOMMU domains we wanted, I suppose we could do away
with uiommu fds, but it sounds like the current proposal would create 20
singleton groups (x86 iommu w/o PCI bridges => all devices are partitionable
endpoints). Asking me to ioctl(inherit) them together into a blob sounds
worse than the current explicit uiommu API.
Thanks,
Aaron
>
> Another option is to make that static configuration APIs via special
> ioctls (or even netlink if you really like it), to change the grouping
> on architectures that allow it.
>
> Cheers.
> Ben.
>
>>
>> -Aaron
>>
>>> As necessary in the future, we can
>>> define a more high performance dma mapping interface for streaming dma
>>> via the group fd. I expect we'll also include architecture specific
>>> group ioctls to describe features and capabilities of the iommu. The
>>> group fd will need to prevent concurrent open()s to maintain a 1:1 group
>>> to userspace process ownership model.
>>>
>>> Also on the table is supporting non-PCI devices with vfio. To do this,
>>> we need to generalize the read/write/mmap and irq eventfd interfaces.
>>> We could keep the same model of segmenting the device fd address space,
>>> perhaps adding ioctls to define the segment offset bit position or we
>>> could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
>>> VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
>>> suffering some degree of fd bloat (group fd, device fd(s), interrupt
>>> event fd(s), per resource fd, etc). For interrupts we can overload
>>> VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq (do non-PCI
>>> devices support MSI?).
>>>
>>> For qemu, these changes imply we'd only support a model where we have a
>>> 1:1 group to iommu domain. The current vfio driver could probably
>>> become vfio-pci as we might end up with more target specific vfio
>>> drivers for non-pci. PCI should be able to maintain a simple -device
>>> vfio-pci,host=bb:dd.f to enable hotplug of individual devices. We'll
>>> need to come up with extra options when we need to expose groups to
>>> guest for pvdma.
>>>
>>> Hope that captures it, feel free to jump in with corrections and
>>> suggestions. Thanks,
>>>
>>> Alex
>>>
>
>
^ permalink raw reply
* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-22 21:49 UTC (permalink / raw)
To: aafabbri
Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <CA781A61.FB15%aafabbri@cisco.com>
> > I wouldn't use uiommu for that.
>
> Any particular reason besides saving a file descriptor?
>
> We use it today, and it seems like a cleaner API than what you propose
> changing it to.
Well for one, we are back to square one vs. grouping constraints.
.../...
> If we in singleton-group land were building our own "groups" which were sets
> of devices sharing the IOMMU domains we wanted, I suppose we could do away
> with uiommu fds, but it sounds like the current proposal would create 20
> singleton groups (x86 iommu w/o PCI bridges => all devices are partitionable
> endpoints). Asking me to ioctl(inherit) them together into a blob sounds
> worse than the current explicit uiommu API.
I'd rather have an API to create super-groups (groups of groups)
statically and then you can use such groups as normal groups using the
same interface. That create/management process could be done via a
simple command line utility or via sysfs banging, whatever...
Cheers,
Ben.
> Thanks,
> Aaron
>
> >
> > Another option is to make that static configuration APIs via special
> > ioctls (or even netlink if you really like it), to change the grouping
> > on architectures that allow it.
> >
> > Cheers.
> > Ben.
> >
> >>
> >> -Aaron
> >>
> >>> As necessary in the future, we can
> >>> define a more high performance dma mapping interface for streaming dma
> >>> via the group fd. I expect we'll also include architecture specific
> >>> group ioctls to describe features and capabilities of the iommu. The
> >>> group fd will need to prevent concurrent open()s to maintain a 1:1 group
> >>> to userspace process ownership model.
> >>>
> >>> Also on the table is supporting non-PCI devices with vfio. To do this,
> >>> we need to generalize the read/write/mmap and irq eventfd interfaces.
> >>> We could keep the same model of segmenting the device fd address space,
> >>> perhaps adding ioctls to define the segment offset bit position or we
> >>> could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
> >>> VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
> >>> suffering some degree of fd bloat (group fd, device fd(s), interrupt
> >>> event fd(s), per resource fd, etc). For interrupts we can overload
> >>> VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq (do non-PCI
> >>> devices support MSI?).
> >>>
> >>> For qemu, these changes imply we'd only support a model where we have a
> >>> 1:1 group to iommu domain. The current vfio driver could probably
> >>> become vfio-pci as we might end up with more target specific vfio
> >>> drivers for non-pci. PCI should be able to maintain a simple -device
> >>> vfio-pci,host=bb:dd.f to enable hotplug of individual devices. We'll
> >>> need to come up with extra options when we need to expose groups to
> >>> guest for pvdma.
> >>>
> >>> Hope that captures it, feel free to jump in with corrections and
> >>> suggestions. Thanks,
> >>>
> >>> Alex
> >>>
> >
> >
^ permalink raw reply
* Re: [PATCH] [v2] sound/soc/fsl/fsl_dma.c: add missing of_node_put
From: Mark Brown @ 2011-08-22 22:27 UTC (permalink / raw)
To: Timur Tabi
Cc: alsa-devel, tiwai, devicetree-discuss, kernel-janitors,
linux-kernel, perex, julia, linuxppc-dev, lrg
In-Reply-To: <1314022961-27513-1-git-send-email-timur@freescale.com>
On Mon, Aug 22, 2011 at 09:22:41AM -0500, Timur Tabi wrote:
> of_parse_phandle increments the reference count of np, so this should be
> decremented before trying the next possibility.
>
> Since we don't actually use np, we can decrement the reference count
> immediately.
Applied, thanks.
^ permalink raw reply
* Re: [PATCH] [v2] sound/soc/fsl/fsl_dma.c: add missing of_node_put
From: Mark Brown @ 2011-08-22 22:30 UTC (permalink / raw)
To: Timur Tabi
Cc: alsa-devel, tiwai, devicetree-discuss, kernel-janitors,
linux-kernel, perex, julia, linuxppc-dev, lrg
In-Reply-To: <1314022961-27513-1-git-send-email-timur@freescale.com>
On Mon, Aug 22, 2011 at 09:22:41AM -0500, Timur Tabi wrote:
> of_parse_phandle increments the reference count of np, so this should be
> decremented before trying the next possibility.
>
> Since we don't actually use np, we can decrement the reference count
> immediately.
Applied, thanks.
^ permalink raw reply
* Re: [PATCH 2/2] sound/soc/fsl/mpc8610_hpcd.c: add missing of_node_put
From: Mark Brown @ 2011-08-22 22:30 UTC (permalink / raw)
To: Julia Lawall
Cc: alsa-devel, Liam Girdwood, Takashi Iwai, devicetree-discuss,
kernel-janitors, linux-kernel, Jaroslav Kysela, linuxppc-dev,
Timur Tabi
In-Reply-To: <1313823721-16930-2-git-send-email-julia@diku.dk>
On Sat, Aug 20, 2011 at 09:02:01AM +0200, Julia Lawall wrote:
> From: Julia Lawall <julia@diku.dk>
>
> The first change is to add an of_node_put, since codec_np has previously
> been allocated. The rest of the patch reorganizes the error handling code
> so the only code executed is that which is needed.
Applied, thanks.
^ permalink raw reply
* Re: kvm PCI assignment & VFIO ramblings
From: aafabbri @ 2011-08-23 0:52 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <1314049785.7662.44.camel@pasglop>
On 8/22/11 2:49 PM, "Benjamin Herrenschmidt" <benh@kernel.crashing.org>
wrote:
>
>>> I wouldn't use uiommu for that.
>>
>> Any particular reason besides saving a file descriptor?
>>
>> We use it today, and it seems like a cleaner API than what you propose
>> changing it to.
>
> Well for one, we are back to square one vs. grouping constraints.
I'm not following you.
You have to enforce group/iommu domain assignment whether you have the
existing uiommu API, or if you change it to your proposed
ioctl(inherit_iommu) API.
The only change needed to VFIO here should be to make uiommu fd assignment
happen on the groups instead of on device fds. That operation fails or
succeeds according to the group semantics (all-or-none assignment/same
uiommu).
I think the question is: do we force 1:1 iommu/group mapping, or do we allow
arbitrary mapping (satisfying group constraints) as we do today.
I'm saying I'm an existing user who wants the arbitrary iommu/group mapping
ability and definitely think the uiommu approach is cleaner than the
ioctl(inherit_iommu) approach. We considered that approach before but it
seemed less clean so we went with the explicit uiommu context.
> .../...
>
>> If we in singleton-group land were building our own "groups" which were sets
>> of devices sharing the IOMMU domains we wanted, I suppose we could do away
>> with uiommu fds, but it sounds like the current proposal would create 20
>> singleton groups (x86 iommu w/o PCI bridges => all devices are partitionable
>> endpoints). Asking me to ioctl(inherit) them together into a blob sounds
>> worse than the current explicit uiommu API.
>
> I'd rather have an API to create super-groups (groups of groups)
> statically and then you can use such groups as normal groups using the
> same interface. That create/management process could be done via a
> simple command line utility or via sysfs banging, whatever...
^ permalink raw reply
* Re: linux-next: boot test failure (net tree)
From: Stephen Rothwell @ 2011-08-23 1:40 UTC (permalink / raw)
To: David Miller
Cc: mikey, netdev, ppc-dev, linux-kernel, linux-next, Paul Mackerras,
jeffrey.t.kirsher, akpm, torvalds
In-Reply-To: <20110822113032.15087c2e190e2b0c3ee7dfb8@canb.auug.org.au>
Hi Dave,
On Mon, 22 Aug 2011 11:30:32 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> Here's what I am applying as a merge fixup to the net tree today so that
> my ppc64_defconfig builds actually build more or less the same set of
> drivers as before this rearrangement.
And this today:
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Tue, 23 Aug 2011 11:23:40 +1000
Subject: [PATCH] sparc: update sparc32_defconfig for net device movement
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
arch/sparc/configs/sparc32_defconfig | 9 ++-------
1 files changed, 2 insertions(+), 7 deletions(-)
diff --git a/arch/sparc/configs/sparc32_defconfig b/arch/sparc/configs/sparc32_defconfig
index fb23fd6..9bc241a 100644
--- a/arch/sparc/configs/sparc32_defconfig
+++ b/arch/sparc/configs/sparc32_defconfig
@@ -2,9 +2,7 @@ CONFIG_EXPERIMENTAL=y
CONFIG_SYSVIPC=y
CONFIG_POSIX_MQUEUE=y
CONFIG_LOG_BUF_SHIFT=14
-CONFIG_SYSFS_DEPRECATED_V2=y
CONFIG_BLK_DEV_INITRD=y
-# CONFIG_CC_OPTIMIZE_FOR_SIZE is not set
CONFIG_SLAB=y
CONFIG_MODULES=y
CONFIG_MODULE_UNLOAD=y
@@ -42,9 +40,10 @@ CONFIG_SCSI_QLOGICPTI=m
CONFIG_SCSI_SUNESP=y
CONFIG_NETDEVICES=y
CONFIG_DUMMY=m
-CONFIG_NET_ETHERNET=y
CONFIG_MII=m
+CONFIG_NET_VENDOR_AMD=y
CONFIG_SUNLANCE=y
+CONFIG_NET_VENDOR_SUN=y
CONFIG_HAPPYMEAL=m
CONFIG_SUNBMAC=m
CONFIG_SUNQE=m
@@ -64,26 +63,22 @@ CONFIG_SERIAL_SUNSU=y
CONFIG_SERIAL_SUNSU_CONSOLE=y
CONFIG_SPI=y
CONFIG_SPI_XILINX=m
-CONFIG_SPI_XILINX_PLTFM=m
CONFIG_SUN_OPENPROMIO=m
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
-CONFIG_AUTOFS_FS=m
CONFIG_AUTOFS4_FS=m
CONFIG_ISO9660_FS=m
CONFIG_PROC_KCORE=y
CONFIG_ROMFS_FS=m
CONFIG_NFS_FS=y
CONFIG_ROOT_NFS=y
-CONFIG_RPCSEC_GSS_KRB5=m
CONFIG_NLS=y
# CONFIG_ENABLE_WARN_DEPRECATED is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_DETECT_HUNG_TASK=y
# CONFIG_SCHED_DEBUG is not set
-# CONFIG_RCU_CPU_STALL_DETECTOR is not set
CONFIG_KGDB=y
CONFIG_KGDB_TESTS=y
CONFIG_CRYPTO_NULL=m
--
1.7.5.4
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
^ permalink raw reply related
* Re: linux-next: boot test failure (net tree)
From: Stephen Rothwell @ 2011-08-23 1:41 UTC (permalink / raw)
To: David Miller
Cc: mikey, netdev, ppc-dev, linux-kernel, linux-next, Paul Mackerras,
jeffrey.t.kirsher, akpm, torvalds
In-Reply-To: <20110823114011.a059aea0138b75bfa7eed1ce@canb.auug.org.au>
Hi Dave,
On Tue, 23 Aug 2011 11:40:11 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>
> On Mon, 22 Aug 2011 11:30:32 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >
> > Here's what I am applying as a merge fixup to the net tree today so that
> > my ppc64_defconfig builds actually build more or less the same set of
> > drivers as before this rearrangement.
>
> And this today:
And this:
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Tue, 23 Aug 2011 11:35:18 +1000
Subject: [PATCH] sparc: update sparc64_defconfig for net device movement
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
arch/sparc/configs/sparc64_defconfig | 20 +++++++++-----------
1 files changed, 9 insertions(+), 11 deletions(-)
diff --git a/arch/sparc/configs/sparc64_defconfig b/arch/sparc/configs/sparc64_defconfig
index 3c1e858..5732728 100644
--- a/arch/sparc/configs/sparc64_defconfig
+++ b/arch/sparc/configs/sparc64_defconfig
@@ -37,8 +37,6 @@ CONFIG_NET_KEY_MIGRATE=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
CONFIG_NET_IPIP=m
-CONFIG_NET_IPGRE=m
-CONFIG_NET_IPGRE_BROADCAST=y
CONFIG_IP_MROUTE=y
CONFIG_IP_PIMSM_V1=y
CONFIG_IP_PIMSM_V2=y
@@ -95,17 +93,19 @@ CONFIG_DM_SNAPSHOT=m
CONFIG_DM_MIRROR=m
CONFIG_DM_ZERO=m
CONFIG_NETDEVICES=y
-CONFIG_NET_ETHERNET=y
CONFIG_MII=m
+CONFIG_NET_VENDOR_AMD=y
CONFIG_SUNLANCE=m
+CONFIG_NET_VENDOR_BROADCOM=y
+CONFIG_BNX2=m
+CONFIG_TIGON3=m
+CONFIG_NET_VENDOR_INTEL=y
+CONFIG_E1000=m
+CONFIG_E1000E=m
+CONFIG_NET_VENDOR_SUN=y
CONFIG_HAPPYMEAL=m
CONFIG_SUNGEM=m
CONFIG_SUNVNET=m
-CONFIG_NET_PCI=y
-CONFIG_E1000=m
-CONFIG_E1000E=m
-CONFIG_TIGON3=m
-CONFIG_BNX2=m
CONFIG_NIU=m
# CONFIG_WLAN is not set
CONFIG_PPP=m
@@ -126,13 +126,13 @@ CONFIG_INPUT_SPARCSPKR=y
# CONFIG_SERIO_SERPORT is not set
CONFIG_SERIO_PCIPS2=m
CONFIG_SERIO_RAW=m
+# CONFIG_LEGACY_PTYS is not set
# CONFIG_DEVKMEM is not set
CONFIG_SERIAL_SUNSU=y
CONFIG_SERIAL_SUNSU_CONSOLE=y
CONFIG_SERIAL_SUNSAB=y
CONFIG_SERIAL_SUNSAB_CONSOLE=y
CONFIG_SERIAL_SUNHV=y
-# CONFIG_LEGACY_PTYS is not set
CONFIG_FB=y
CONFIG_FB_TILEBLITTING=y
CONFIG_FB_SBUS=y
@@ -206,10 +206,8 @@ CONFIG_PRINTK_TIME=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_KERNEL=y
CONFIG_LOCKUP_DETECTOR=y
-CONFIG_DETECT_HUNG_TASK=y
# CONFIG_SCHED_DEBUG is not set
CONFIG_SCHEDSTATS=y
-# CONFIG_RCU_CPU_STALL_DETECTOR is not set
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_BLK_DEV_IO_TRACE=y
CONFIG_KEYS=y
--
1.7.5.4
--
Cheers,
Stephen Rothwell sfr@canb.auug.org.au
http://www.canb.auug.org.au/~sfr/
^ permalink raw reply related
* Re: linux-next: boot test failure (net tree)
From: David Miller @ 2011-08-23 2:13 UTC (permalink / raw)
To: sfr
Cc: mikey, netdev, linuxppc-dev, linux-kernel, linux-next, paulus,
jeffrey.t.kirsher, akpm, torvalds
In-Reply-To: <20110823114129.ceb18da164bf7df3c145941b@canb.auug.org.au>
From: Stephen Rothwell <sfr@canb.auug.org.au>
Date: Tue, 23 Aug 2011 11:41:29 +1000
> On Tue, 23 Aug 2011 11:40:11 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>>
>> On Mon, 22 Aug 2011 11:30:32 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
>> >
>> > Here's what I am applying as a merge fixup to the net tree today so that
>> > my ppc64_defconfig builds actually build more or less the same set of
>> > drivers as before this rearrangement.
>>
>> And this today:
>
> And this:
I'm starting to get uncomfortable with this whole situation, and I
feel more and more that these new kconfig guards are not tenable.
Changing defconfig files might fix the "automated test boot with
defconfig" case but it won't fix the case of someone trying to
automate a build and boot using a different, existing, config file.
It ought to work too, and I do know people really do this.
And just the fact that we would have to merge all of these defconfig changes
through the networking tree is evidence of how it's really not reasonable
to be doing things this way.
Jeff, I think we need to revert the dependencies back to what they were
before the drivers/net moves. Could you prepare a patch which does that?
^ permalink raw reply
* Re: linux-next: boot test failure (net tree)
From: Jeff Kirsher @ 2011-08-23 2:26 UTC (permalink / raw)
To: David Miller
Cc: sfr@canb.auug.org.au, mikey@neuling.org, netdev@vger.kernel.org,
linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
linux-next@vger.kernel.org, paulus@samba.org,
akpm@linux-foundation.org, torvalds@linux-foundation.org
In-Reply-To: <20110822.191348.2099822249437201579.davem@davemloft.net>
[-- Attachment #1: Type: text/plain, Size: 1586 bytes --]
On Mon, 2011-08-22 at 19:13 -0700, David Miller wrote:
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Tue, 23 Aug 2011 11:41:29 +1000
>
> > On Tue, 23 Aug 2011 11:40:11 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >>
> >> On Mon, 22 Aug 2011 11:30:32 +1000 Stephen Rothwell <sfr@canb.auug.org.au> wrote:
> >> >
> >> > Here's what I am applying as a merge fixup to the net tree today so that
> >> > my ppc64_defconfig builds actually build more or less the same set of
> >> > drivers as before this rearrangement.
> >>
> >> And this today:
> >
> > And this:
>
> I'm starting to get uncomfortable with this whole situation, and I
> feel more and more that these new kconfig guards are not tenable.
>
> Changing defconfig files might fix the "automated test boot with
> defconfig" case but it won't fix the case of someone trying to
> automate a build and boot using a different, existing, config file.
> It ought to work too, and I do know people really do this.
>
> And just the fact that we would have to merge all of these defconfig changes
> through the networking tree is evidence of how it's really not reasonable
> to be doing things this way.
>
> Jeff, I think we need to revert the dependencies back to what they were
> before the drivers/net moves. Could you prepare a patch which does that?
>
I was just finishing up those patches (not including any defconfig
changes) and started looking at a patch to fix/resolve the issues that
Stephen is seeing.
Let me see what I can come up with tonight to resolve this.
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]
^ permalink raw reply
* Re: kvm PCI assignment & VFIO ramblings
From: David Gibson @ 2011-08-23 2:38 UTC (permalink / raw)
To: Alex Williamson
Cc: chrisw, Alexey Kardashevskiy, kvm, Paul Mackerras,
linux-pci@vger.kernel.org, qemu-devel, aafabbri, iommu,
Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <1314027950.6866.242.camel@x201.home>
On Mon, Aug 22, 2011 at 09:45:48AM -0600, Alex Williamson wrote:
> On Mon, 2011-08-22 at 15:55 +1000, David Gibson wrote:
> > On Sat, Aug 20, 2011 at 09:51:39AM -0700, Alex Williamson wrote:
> > > We had an extremely productive VFIO BoF on Monday. Here's my attempt to
> > > capture the plan that I think we agreed to:
> > >
> > > We need to address both the description and enforcement of device
> > > groups. Groups are formed any time the iommu does not have resolution
> > > between a set of devices. On x86, this typically happens when a
> > > PCI-to-PCI bridge exists between the set of devices and the iommu. For
> > > Power, partitionable endpoints define a group. Grouping information
> > > needs to be exposed for both userspace and kernel internal usage. This
> > > will be a sysfs attribute setup by the iommu drivers. Perhaps:
> > >
> > > # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> > > 42
> > >
> > > (I use a PCI example here, but attribute should not be PCI specific)
> >
> > Ok. Am I correct in thinking these group IDs are representing the
> > minimum granularity, and are therefore always static, defined only by
> > the connected hardware, not by configuration?
>
> Yes, that's the idea. An open question I have towards the configuration
> side is whether we might add iommu driver specific options to the
> groups. For instance on x86 where we typically have B:D.F granularity,
> should we have an option not to trust multi-function devices and use a
> B:D granularity for grouping?
Right. And likewise I can see a place for configuration parameters
like the present 'allow_unsafe_irqs'. But these would be more-or-less
global options which affected the overall granularity, rather than
detailed configuration such as explicitly binding some devices into a
group, yes?
> > > >From there we have a few options. In the BoF we discussed a model where
> > > binding a device to vfio creates a /dev/vfio$GROUP character device
> > > file. This "group" fd provides provides dma mapping ioctls as well as
> > > ioctls to enumerate and return a "device" fd for each attached member of
> > > the group (similar to KVM_CREATE_VCPU). We enforce grouping by
> > > returning an error on open() of the group fd if there are members of the
> > > group not bound to the vfio driver. Each device fd would then support a
> > > similar set of ioctls and mapping (mmio/pio/config) interface as current
> > > vfio, except for the obvious domain and dma ioctls superseded by the
> > > group fd.
> >
> > It seems a slightly strange distinction that the group device appears
> > when any device in the group is bound to vfio, but only becomes usable
> > when all devices are bound.
> >
> > > Another valid model might be that /dev/vfio/$GROUP is created for all
> > > groups when the vfio module is loaded. The group fd would allow open()
> > > and some set of iommu querying and device enumeration ioctls, but would
> > > error on dma mapping and retrieving device fds until all of the group
> > > devices are bound to the vfio driver.
> >
> > Which is why I marginally prefer this model, although it's not a big
> > deal.
>
> Right, we can also combine models. Binding a device to vfio
> creates /dev/vfio$GROUP, which only allows a subset of ioctls and no
> device access until all the group devices are also bound. I think
> the /dev/vfio/$GROUP might help provide an enumeration interface as well
> though, which could be useful.
I'm not entirely sure what you mean here. But, that's now several
weak votes in favour of the always-present group devices, and none in
favour of the created-when-first-device-bound model, so I suggest we
take the /dev/vfio/$GROUP as our tentative approach.
> > > In either case, the uiommu interface is removed entirely since dma
> > > mapping is done via the group fd. As necessary in the future, we can
> > > define a more high performance dma mapping interface for streaming dma
> > > via the group fd. I expect we'll also include architecture specific
> > > group ioctls to describe features and capabilities of the iommu. The
> > > group fd will need to prevent concurrent open()s to maintain a 1:1 group
> > > to userspace process ownership model.
> >
> > A 1:1 group<->process correspondance seems wrong to me. But there are
> > many ways you could legitimately write the userspace side of the code,
> > many of them involving some sort of concurrency. Implementing that
> > concurrency as multiple processes (using explicit shared memory and/or
> > other IPC mechanisms to co-ordinate) seems a valid choice that we
> > shouldn't arbitrarily prohibit.
> >
> > Obviously, only one UID may be permitted to have the group open at a
> > time, and I think that's enough to prevent them doing any worse than
> > shooting themselves in the foot.
>
> 1:1 group<->process is probably too strong. Not allowing concurrent
> open()s on the group file enforces a single userspace entity is
> responsible for that group. Device fds can be passed to other
> processes, but only retrieved via the group fd. I suppose we could even
> branch off the dma interface into a different fd, but it seems like we
> would logically want to serialize dma mappings at each iommu group
> anyway. I'm open to alternatives, this just seemed an easy way to do
> it. Restricting on UID implies that we require isolated qemu instances
> to run as different UIDs.
Well.. yes and know. It means guests which need to be isolated from
malicious interference with each other need different UIDs, but given
that if they have the same UID one qemu can kill() or ptrace() the
other, they're not isolated in that sense anyway.
It seems to me that running as the same UIDs with different device
groups assigned, the guests are still pretty well isolated from
accidental interference with each other.
> I know that's a goal, but I don't know if we
> want to make it an assumption in the group security model.
>
> > > Also on the table is supporting non-PCI devices with vfio. To do this,
> > > we need to generalize the read/write/mmap and irq eventfd interfaces.
> > > We could keep the same model of segmenting the device fd address space,
> > > perhaps adding ioctls to define the segment offset bit position or we
> > > could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
> > > VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
> > > suffering some degree of fd bloat (group fd, device fd(s), interrupt
> > > event fd(s), per resource fd, etc). For interrupts we can overload
> > > VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq
> >
> > Sounds reasonable.
> >
> > > (do non-PCI
> > > devices support MSI?).
> >
> > They can. Obviously they might not have exactly the same semantics as
> > PCI MSIs, but I know we have SoC systems with (non-PCI) on-die devices
> > whose interrupts are treated by the (also on-die) root interrupt
> > controller in the same way as PCI MSIs.
>
> Ok, I suppose we can define ioctls to enable these as we go. We also
> need to figure out how non-PCI resources, interrupts, and iommu mapping
> restrictions are described via vfio.
Yeah. On device tree platforms we'd want it to be bound to the device
tree representation in some way.
For platform devices, at least, could we have the index into the array
of resources take the place of BAR number for PCI?
>
> > > For qemu, these changes imply we'd only support a model where we have a
> > > 1:1 group to iommu domain. The current vfio driver could probably
> > > become vfio-pci as we might end up with more target specific vfio
> > > drivers for non-pci. PCI should be able to maintain a simple -device
> > > vfio-pci,host=bb:dd.f to enable hotplug of individual devices. We'll
> > > need to come up with extra options when we need to expose groups to
> > > guest for pvdma.
> >
> > Are you saying that you'd no longer support the current x86 usage of
> > putting all of one guest's devices into a single domain?
>
> Yes. I'm not sure there's a good ROI to prioritize that model. We have
> to assume >1 device per guest is a typical model and that the iotlb is
> large enough that we might improve thrashing to see both a resource and
> performance benefit from it. I'm open to suggestions for how we could
> include it though.
Creating supergroups of some sort seems to be what we need, but I'm
not sure what's the best interface for doing that.
> > If that's
> > not what you're saying, how would the domains - now made up of a
> > user's selection of groups, rather than individual devices - be
> > configured?
> >
> > > Hope that captures it, feel free to jump in with corrections and
> > > suggestions. Thanks,
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: LiuShuo @ 2011-08-23 3:09 UTC (permalink / raw)
To: Scott Wood
Cc: Li Yang-R58472, Artem Bityutskiy, Matthieu CASTET,
linuxppc-dev@ozlabs.org, linux-mtd@lists.infradead.org,
Ivan Djelic, dwmw2@infradead.org
In-Reply-To: <4E52819C.8080204@freescale.com>
=E4=BA=8E 2011=E5=B9=B408=E6=9C=8823=E6=97=A5 00:19, Scott Wood =E5=86=99=
=E9=81=93:
> On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
>> Scott Wood a =C3=A9crit :
>>> To eliminate it we'd need to do an extra data transfer without reissu=
ing
>>> the command, which Shuo was unable to get to work.
>>>
>> That's weird because our controller seems quite flexible [1].
>>
>> Something like that should work ?
>>
>> out_be32(&lbc->fir,
>> (FIR_OP_CM2<< FIR_OP0_SHIFT) |
>> (FIR_OP_CA<< FIR_OP1_SHIFT) |
>> (FIR_OP_PA<< FIR_OP2_SHIFT) |
>> (FIR_OP_WB<< FIR_OP3_SHIFT));
>> refill FCM buffer with next 2k data
>>
>> out_be32(&lbc->fir,
>> (FIR_OP_WB<< FIR_OP3_SHIFT) |
>> (FIR_OP_CM3<< FIR_OP4_SHIFT) |
>> (FIR_OP_CW1<< FIR_OP5_SHIFT) |
>> (FIR_OP_RS<< FIR_OP6_SHIFT));
> Something like that is what I originally suggested, but Shuo said it
> didn't work (even in theory, it requires a CE-don't-care NAND chip,
> since bus atomicity is broken).
>
> Shuo, what specifically did you try, and what did you see happen?
>
> -Scott
First, if we want to read 4K data with once command issuing, we can't=20
use HW_ECC.
Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st=20
2k and 2nd 2k data.
They will cover the data in the head of 2nd 2K.
-----------------------------------------------------------------------=
--------------
| xxxxxx ... 1st 2k xxxxxxx ... | ff ff ff ... ff xxxxxx 2nd 2k xxxxxxx |
-----------------------------------------------------------------------=
--------------
It is worse to write 4k data with once command issuing. It can't write=20
the 2nd data correctly.
-Liu Shuo
^ permalink raw reply
* Re: linux-next: boot test failure (net tree)
From: Arnaud Lacombe @ 2011-08-23 3:50 UTC (permalink / raw)
To: David Miller
Cc: sfr, mikey, linux-kbuild, netdev, linuxppc-dev, linux-kernel,
linux-next, paulus, jeffrey.t.kirsher, akpm, torvalds
In-Reply-To: <20110822.191348.2099822249437201579.davem@davemloft.net>
Hi,
[Added linux-kbuild@ to the Cc: list.]
On Mon, Aug 22, 2011 at 10:13 PM, David Miller <davem@davemloft.net> wrote:
> From: Stephen Rothwell <sfr@canb.auug.org.au>
> Date: Tue, 23 Aug 2011 11:41:29 +1000
>
>> On Tue, 23 Aug 2011 11:40:11 +1000 Stephen Rothwell <sfr@canb.auug.org.a=
u> wrote:
>>>
>>> On Mon, 22 Aug 2011 11:30:32 +1000 Stephen Rothwell <sfr@canb.auug.org.=
au> wrote:
>>> >
>>> > Here's what I am applying as a merge fixup to the net tree today so t=
hat
>>> > my ppc64_defconfig builds actually build more or less the same set of
>>> > drivers as before this rearrangement.
>>>
>>> And this today:
>>
>> And this:
>
> I'm starting to get uncomfortable with this whole situation, and I
> feel more and more that these new kconfig guards are not tenable.
>
> Changing defconfig files might fix the "automated test boot with
> defconfig" case but it won't fix the case of someone trying to
> automate a build and boot using a different, existing, config file.
> It ought to work too, and I do know people really do this.
>
> And just the fact that we would have to merge all of these defconfig chan=
ges
> through the networking tree is evidence of how it's really not reasonable
> to be doing things this way.
>
> Jeff, I think we need to revert the dependencies back to what they were
> before the drivers/net moves. =A0Could you prepare a patch which does tha=
t?
>
Are you implying we need some kind of way to migrate config ?
- Arnaud
^ permalink raw reply
* Re: linux-next: boot test failure (net tree)
From: David Miller @ 2011-08-23 4:02 UTC (permalink / raw)
To: lacombar
Cc: sfr, mikey, linux-kbuild, netdev, linuxppc-dev, linux-kernel,
linux-next, paulus, jeffrey.t.kirsher, akpm, torvalds
In-Reply-To: <CACqU3MW4BnucRt3gxPrKPDvWEjaVuxRF1VOPWk5hTRfneyANkg@mail.gmail.com>
From: Arnaud Lacombe <lacombar@gmail.com>
Date: Mon, 22 Aug 2011 23:50:02 -0400
> Are you implying we need some kind of way to migrate config ?
The issue is that the dependencies for every single ethernet driver
have changed. Some dependencies have been dropped (f.e. NETDEV_10000
and some have been added (f.e. ETHERNET, NET_VENDOR_****)
So right now an automated (non-prompted, default to no on all new
options) run on an existing config results in all ethernet drivers
getting disabled because the new dependencies don't get enabled.
This wouldn't be so bad if it was just one or two drivers, but in
this case it's every single ethernet driver which will have and hit
this problem.
^ permalink raw reply
* Re: [PATCH 1/2] [hw-breakpoint] Use generic hw-breakpoint interfaces for new PPC ptrace flags
From: David Gibson @ 2011-08-23 5:08 UTC (permalink / raw)
To: K.Prasad; +Cc: linuxppc-dev, Thiago Jung Bauermann, Edjunior Barbosa Machado
In-Reply-To: <20110819075136.GB21817@in.ibm.com>
On Fri, Aug 19, 2011 at 01:21:36PM +0530, K.Prasad wrote:
> PPC_PTRACE_GETHWDBGINFO, PPC_PTRACE_SETHWDEBUG and PPC_PTRACE_DELHWDEBUG are
> PowerPC specific ptrace flags that use the watchpoint register. While they are
> targeted primarily towards BookE users, user-space applications such as GDB
> have started using them for BookS too.
>
> This patch enables the use of generic hardware breakpoint interfaces for these
> new flags. The version number of the associated data structures
> "ppc_hw_breakpoint" and "ppc_debug_info" is incremented to denote new semantics.
So, the structure itself doesn't seem to have been extended. I don't
understand what the semantic difference is - your patch comment needs
to explain this clearly.
> Apart from the usual benefits of using generic hw-breakpoint interfaces, these
> changes allow debuggers (such as GDB) to use a common set of ptrace flags for
> their watchpoint needs and allow more precise breakpoint specification (length
> of the variable can be specified).
What is the mechanism for implementing the range breakpoint on book3s?
> [Edjunior: Identified an issue in the patch with the sanity check for version
> numbers]
>
> Tested-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com>
> Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
> ---
> Documentation/powerpc/ptrace.txt | 16 ++++++
> arch/powerpc/kernel/ptrace.c | 104 +++++++++++++++++++++++++++++++++++---
> 2 files changed, 112 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/powerpc/ptrace.txt b/Documentation/powerpc/ptrace.txt
> index f4a5499..97301ae 100644
> --- a/Documentation/powerpc/ptrace.txt
> +++ b/Documentation/powerpc/ptrace.txt
> @@ -127,6 +127,22 @@ Some examples of using the structure to:
> p.addr2 = (uint64_t) end_range;
> p.condition_value = 0;
>
> +- set a watchpoint in server processors (BookS) using version 2
> +
> + p.version = 2;
> + p.trigger_type = PPC_BREAKPOINT_TRIGGER_RW;
> + p.addr_mode = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE;
> + or
> + p.addr_mode = PPC_BREAKPOINT_MODE_RANGE_EXACT;
> +
> + p.condition_mode = PPC_BREAKPOINT_CONDITION_NONE;
> + p.addr = (uint64_t) begin_range;
> + /* For PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE addr2 needs to be specified, where
> + * addr2 - addr <= 8 Bytes.
> + */
> + p.addr2 = (uint64_t) end_range;
> + p.condition_value = 0;
> +
> 3. PTRACE_DELHWDEBUG
>
> Takes an integer which identifies an existing breakpoint or watchpoint
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 05b7dd2..18d28b6 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -1339,11 +1339,17 @@ static int set_dac_range(struct task_struct *child,
> static long ppc_set_hwdebug(struct task_struct *child,
> struct ppc_hw_breakpoint *bp_info)
> {
> +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> + int ret, len = 0;
> + struct thread_struct *thread = &(child->thread);
> + struct perf_event *bp;
> + struct perf_event_attr attr;
> +#endif /* CONFIG_HAVE_HW_BREAKPOINT */
I'm confused. This compiled before on book3s, and I don't see any
changes to Makefile or Kconfig in the patch that will result in this
code compiling when it previously didn't Why are these new guards
added?
> #ifndef CONFIG_PPC_ADV_DEBUG_REGS
> unsigned long dabr;
> #endif
>
> - if (bp_info->version != 1)
> + if ((bp_info->version != 1) && (bp_info->version != 2))
> return -ENOTSUPP;
> #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> /*
> @@ -1382,13 +1388,9 @@ static long ppc_set_hwdebug(struct task_struct *child,
> */
> if ((bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_RW) == 0 ||
> (bp_info->trigger_type & ~PPC_BREAKPOINT_TRIGGER_RW) != 0 ||
> - bp_info->addr_mode != PPC_BREAKPOINT_MODE_EXACT ||
> bp_info->condition_mode != PPC_BREAKPOINT_CONDITION_NONE)
> return -EINVAL;
>
> - if (child->thread.dabr)
> - return -ENOSPC;
> -
You remove this test to see if the single watchpoint slot is already
in use, but I don't see another test replacing it.
> if ((unsigned long)bp_info->addr >= TASK_SIZE)
> return -EIO;
>
> @@ -1398,15 +1400,86 @@ static long ppc_set_hwdebug(struct task_struct *child,
> dabr |= DABR_DATA_READ;
> if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE)
> dabr |= DABR_DATA_WRITE;
> +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> + if (bp_info->version == 1)
> + goto version_one;
There are several legitimate uses of goto in the kernel, but this is
definitely not one of them. You're essentially using it to put the
old and new versions of the same function in one block. Nasty.
> + if (ptrace_get_breakpoints(child) < 0)
> + return -ESRCH;
>
> - child->thread.dabr = dabr;
> + bp = thread->ptrace_bps[0];
> + if (!bp_info->addr) {
> + if (bp) {
> + unregister_hw_breakpoint(bp);
> + thread->ptrace_bps[0] = NULL;
> + }
> + ptrace_put_breakpoints(child);
> + return 0;
Why are you making setting a 0 watchpoint remove the existing one (I
think that's what this does). I thought there was an explicit del
breakpoint operation instead.
> + }
> + /*
> + * Check if the request is for 'range' breakpoints. We can
> + * support it if range < 8 bytes.
> + */
> + if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE)
> + len = bp_info->addr2 - bp_info->addr;
So you compute the length here, but I don't see you ever test if it is
< 8 and return an error.
> + else if (bp_info->addr_mode != PPC_BREAKPOINT_MODE_EXACT) {
> + ptrace_put_breakpoints(child);
> + return -EINVAL;
> + }
> + if (bp) {
> + attr = bp->attr;
> + attr.bp_addr = (unsigned long)bp_info->addr & ~HW_BREAKPOINT_ALIGN;
> + arch_bp_generic_fields(dabr &
> + (DABR_DATA_WRITE | DABR_DATA_READ),
> + &attr.bp_type);
> + attr.bp_len = len;
> + ret = modify_user_hw_breakpoint(bp, &attr);
> + if (ret) {
> + ptrace_put_breakpoints(child);
> + return ret;
> + }
> + thread->ptrace_bps[0] = bp;
> + ptrace_put_breakpoints(child);
> + thread->dabr = dabr;
> + return 0;
> + }
>
> + /* Create a new breakpoint request if one doesn't exist already */
> + hw_breakpoint_init(&attr);
> + attr.bp_addr = (unsigned long)bp_info->addr & ~HW_BREAKPOINT_ALIGN;
You seem to be silently masking the given address, which seems
completely wrong.
> + attr.bp_len = len;
> + arch_bp_generic_fields(dabr & (DABR_DATA_WRITE | DABR_DATA_READ),
> + &attr.bp_type);
> +
> + thread->ptrace_bps[0] = bp = register_user_hw_breakpoint(&attr,
> + ptrace_triggered, NULL, child);
> + if (IS_ERR(bp)) {
> + thread->ptrace_bps[0] = NULL;
> + ptrace_put_breakpoints(child);
> + return PTR_ERR(bp);
> + }
> +
> + ptrace_put_breakpoints(child);
> + return 1;
> +#endif /* CONFIG_HAVE_HW_BREAKPOINT */
> +
> +version_one:
> + if (bp_info->addr_mode != PPC_BREAKPOINT_MODE_EXACT)
> + return -EINVAL;
> +
> + if (child->thread.dabr)
> + return -ENOSPC;
> +
> + child->thread.dabr = dabr;
> return 1;
> #endif /* !CONFIG_PPC_ADV_DEBUG_DVCS */
> }
>
> static long ppc_del_hwdebug(struct task_struct *child, long addr, long data)
> {
> +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> + struct thread_struct *thread = &(child->thread);
> + struct perf_event *bp;
> +#endif /* CONFIG_HAVE_HW_BREAKPOINT */
> #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> int rc;
>
> @@ -1426,10 +1499,24 @@ static long ppc_del_hwdebug(struct task_struct *child, long addr, long data)
> #else
> if (data != 1)
> return -EINVAL;
> +
> +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> + if (ptrace_get_breakpoints(child) < 0)
> + return -ESRCH;
> +
> + bp = thread->ptrace_bps[0];
> + if (bp) {
> + unregister_hw_breakpoint(bp);
> + thread->ptrace_bps[0] = NULL;
> + }
> + ptrace_put_breakpoints(child);
> + return 0;
> +#else /* CONFIG_HAVE_HW_BREAKPOINT */
> if (child->thread.dabr == 0)
> return -ENOENT;
>
> child->thread.dabr = 0;
> +#endif /* CONFIG_HAVE_HW_BREAKPOINT */
>
> return 0;
> #endif
> @@ -1536,7 +1623,8 @@ long arch_ptrace(struct task_struct *child, long request,
> case PPC_PTRACE_GETHWDBGINFO: {
> struct ppc_debug_info dbginfo;
>
> - dbginfo.version = 1;
> + /* We return the highest version number supported */
> + dbginfo.version = 2;
> #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> dbginfo.num_instruction_bps = CONFIG_PPC_ADV_DEBUG_IACS;
> dbginfo.num_data_bps = CONFIG_PPC_ADV_DEBUG_DACS;
> @@ -1560,7 +1648,7 @@ long arch_ptrace(struct task_struct *child, long request,
> dbginfo.data_bp_alignment = 4;
> #endif
> dbginfo.sizeof_condition = 0;
> - dbginfo.features = 0;
> + dbginfo.features = PPC_DEBUG_FEATURE_DATA_BP_RANGE;
> #endif /* CONFIG_PPC_ADV_DEBUG_REGS */
>
> if (!access_ok(VERIFY_WRITE, datavp,
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* Re: [PATCH 2/2] [PowerPC Book3E] Introduce new ptrace debug feature flag
From: David Gibson @ 2011-08-23 5:09 UTC (permalink / raw)
To: K.Prasad; +Cc: linuxppc-dev, Thiago Jung Bauermann, Edjunior Barbosa Machado
In-Reply-To: <20110819075338.GC21817@in.ibm.com>
On Fri, Aug 19, 2011 at 01:23:38PM +0530, K.Prasad wrote:
>
> While PPC_PTRACE_SETHWDEBUG ptrace flag in PowerPC accepts
> PPC_BREAKPOINT_MODE_EXACT mode of breakpoint, the same is not intimated to the
> user-space debuggers (like GDB) who may want to use it. Hence we introduce a
> new PPC_DEBUG_FEATURE_DATA_BP_EXACT flag which will be populated on the
> "features" member of "struct ppc_debug_info" to advertise support for the
> same on Book3E PowerPC processors.
I thought the idea was that the BP_EXACT mode was the default - if the
new interface was supported at all, then BP_EXACT was always
supported. So, why do you need a new flag?
>
> Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
> ---
> arch/powerpc/include/asm/ptrace.h | 1 +
> arch/powerpc/kernel/ptrace.c | 1 +
> 2 files changed, 2 insertions(+), 0 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
> index 48223f9..cf014f9 100644
> --- a/arch/powerpc/include/asm/ptrace.h
> +++ b/arch/powerpc/include/asm/ptrace.h
> @@ -380,6 +380,7 @@ struct ppc_debug_info {
> #define PPC_DEBUG_FEATURE_INSN_BP_MASK 0x0000000000000002
> #define PPC_DEBUG_FEATURE_DATA_BP_RANGE 0x0000000000000004
> #define PPC_DEBUG_FEATURE_DATA_BP_MASK 0x0000000000000008
> +#define PPC_DEBUG_FEATURE_DATA_BP_EXACT 0x0000000000000010
>
> #ifndef __ASSEMBLY__
>
> diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> index 18d28b6..71db5a6 100644
> --- a/arch/powerpc/kernel/ptrace.c
> +++ b/arch/powerpc/kernel/ptrace.c
> @@ -1636,6 +1636,7 @@ long arch_ptrace(struct task_struct *child, long request,
> #ifdef CONFIG_PPC_ADV_DEBUG_DAC_RANGE
> dbginfo.features |=
> PPC_DEBUG_FEATURE_DATA_BP_RANGE |
> + PPC_DEBUG_FEATURE_DATA_BP_EXACT |
> PPC_DEBUG_FEATURE_DATA_BP_MASK;
> #endif
> #else /* !CONFIG_PPC_ADV_DEBUG_REGS */
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
^ permalink raw reply
* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-23 6:54 UTC (permalink / raw)
To: aafabbri
Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <CA7847D2.FB3A%aafabbri@cisco.com>
On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote:
> I'm not following you.
>
> You have to enforce group/iommu domain assignment whether you have the
> existing uiommu API, or if you change it to your proposed
> ioctl(inherit_iommu) API.
>
> The only change needed to VFIO here should be to make uiommu fd assignment
> happen on the groups instead of on device fds. That operation fails or
> succeeds according to the group semantics (all-or-none assignment/same
> uiommu).
Ok, so I missed that part where you change uiommu to operate on group
fd's rather than device fd's, my apologies if you actually wrote that
down :-) It might be obvious ... bare with me I just flew back from the
US and I am badly jet lagged ...
So I see what you mean, however...
> I think the question is: do we force 1:1 iommu/group mapping, or do we allow
> arbitrary mapping (satisfying group constraints) as we do today.
>
> I'm saying I'm an existing user who wants the arbitrary iommu/group mapping
> ability and definitely think the uiommu approach is cleaner than the
> ioctl(inherit_iommu) approach. We considered that approach before but it
> seemed less clean so we went with the explicit uiommu context.
Possibly, the question that interest me the most is what interface will
KVM end up using. I'm also not terribly fan with the (perceived)
discrepancy between using uiommu to create groups but using the group fd
to actually do the mappings, at least if that is still the plan.
If the separate uiommu interface is kept, then anything that wants to be
able to benefit from the ability to put multiple devices (or existing
groups) into such a "meta group" would need to be explicitly modified to
deal with the uiommu APIs.
I tend to prefer such "meta groups" as being something you create
statically using a configuration interface, either via sysfs, netlink or
ioctl's to a "control" vfio device driven by a simple command line tool
(which can have the configuration stored in /etc and re-apply it at
boot).
That way, any program capable of exploiting VFIO "groups" will
automatically be able to exploit those "meta groups" (or groups of
groups) as well as long as they are supported on the system.
If we ever have system specific constraints as to how such groups can be
created, then it can all be handled at the level of that configuration
tool without impact on whatever programs know how to exploit them via
the VFIO interfaces.
> > .../...
> >
> >> If we in singleton-group land were building our own "groups" which were sets
> >> of devices sharing the IOMMU domains we wanted, I suppose we could do away
> >> with uiommu fds, but it sounds like the current proposal would create 20
> >> singleton groups (x86 iommu w/o PCI bridges => all devices are partitionable
> >> endpoints). Asking me to ioctl(inherit) them together into a blob sounds
> >> worse than the current explicit uiommu API.
> >
> > I'd rather have an API to create super-groups (groups of groups)
> > statically and then you can use such groups as normal groups using the
> > same interface. That create/management process could be done via a
> > simple command line utility or via sysfs banging, whatever...
Cheers,
Ben.
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox