LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-22 21:49 UTC (permalink / raw)
  To: aafabbri
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
	Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <CA781A61.FB15%aafabbri@cisco.com>


> > I wouldn't use uiommu for that.
> 
> Any particular reason besides saving a file descriptor?
> 
> We use it today, and it seems like a cleaner API than what you propose
> changing it to.

Well for one, we are back to square one vs. grouping constraints.

 .../...

> If we in singleton-group land were building our own "groups" which were sets
> of devices sharing the IOMMU domains we wanted, I suppose we could do away
> with uiommu fds, but it sounds like the current proposal would create 20
> singleton groups (x86 iommu w/o PCI bridges => all devices are partitionable
> endpoints).  Asking me to ioctl(inherit) them together into a blob sounds
> worse than the current explicit uiommu API.

I'd rather have an API to create super-groups (groups of groups)
statically and then you can use such groups as normal groups using the
same interface. That create/management process could be done via a
simple command line utility or via sysfs banging, whatever...

Cheers,
Ben.

> Thanks,
> Aaron
> 
> > 
> > Another option is to make that static configuration APIs via special
> > ioctls (or even netlink if you really like it), to change the grouping
> > on architectures that allow it.
> > 
> > Cheers.
> > Ben.
> > 
> >> 
> >> -Aaron
> >> 
> >>> As necessary in the future, we can
> >>> define a more high performance dma mapping interface for streaming dma
> >>> via the group fd.  I expect we'll also include architecture specific
> >>> group ioctls to describe features and capabilities of the iommu.  The
> >>> group fd will need to prevent concurrent open()s to maintain a 1:1 group
> >>> to userspace process ownership model.
> >>> 
> >>> Also on the table is supporting non-PCI devices with vfio.  To do this,
> >>> we need to generalize the read/write/mmap and irq eventfd interfaces.
> >>> We could keep the same model of segmenting the device fd address space,
> >>> perhaps adding ioctls to define the segment offset bit position or we
> >>> could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
> >>> VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
> >>> suffering some degree of fd bloat (group fd, device fd(s), interrupt
> >>> event fd(s), per resource fd, etc).  For interrupts we can overload
> >>> VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq (do non-PCI
> >>> devices support MSI?).
> >>> 
> >>> For qemu, these changes imply we'd only support a model where we have a
> >>> 1:1 group to iommu domain.  The current vfio driver could probably
> >>> become vfio-pci as we might end up with more target specific vfio
> >>> drivers for non-pci.  PCI should be able to maintain a simple -device
> >>> vfio-pci,host=bb:dd.f to enable hotplug of individual devices.  We'll
> >>> need to come up with extra options when we need to expose groups to
> >>> guest for pvdma.
> >>> 
> >>> Hope that captures it, feel free to jump in with corrections and
> >>> suggestions.  Thanks,
> >>> 
> >>> Alex
> >>> 
> > 
> > 

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: aafabbri @ 2011-08-22 21:38 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
	Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <1314046171.7662.27.camel@pasglop>




On 8/22/11 1:49 PM, "Benjamin Herrenschmidt" <benh@kernel.crashing.org>
wrote:

> On Mon, 2011-08-22 at 13:29 -0700, aafabbri wrote:
> 
>>> Each device fd would then support a
>>> similar set of ioctls and mapping (mmio/pio/config) interface as current
>>> vfio, except for the obvious domain and dma ioctls superseded by the
>>> group fd.
>>> 
>>> Another valid model might be that /dev/vfio/$GROUP is created for all
>>> groups when the vfio module is loaded.  The group fd would allow open()
>>> and some set of iommu querying and device enumeration ioctls, but would
>>> error on dma mapping and retrieving device fds until all of the group
>>> devices are bound to the vfio driver.
>>> 
>>> In either case, the uiommu interface is removed entirely since dma
>>> mapping is done via the group fd.
>> 
>> The loss in generality is unfortunate. I'd like to be able to support
>> arbitrary iommu domain <-> device assignment.  One way to do this would be
>> to keep uiommu, but to return an error if someone tries to assign more than
>> one uiommu context to devices in the same group.
> 
> I wouldn't use uiommu for that.

Any particular reason besides saving a file descriptor?

We use it today, and it seems like a cleaner API than what you propose
changing it to.

> If the HW or underlying kernel drivers
> support it, what I'd suggest is that you have an (optional) ioctl to
> bind two groups (you have to have both opened already) or for one group
> to "capture" another one.

You'll need other rules there too.. "both opened already, but zero mappings
performed yet as they would have instantiated a default IOMMU domain".

Keep in mind the only case I'm using is singleton groups, a.k.a. devices.

Since what I want is to specify which devices can do things like share
network buffers (in a way that conserves IOMMU hw resources), it seems
cleanest to expose this explicitly, versus some "inherit iommu domain from
another device" ioctl.  What happens if I do something like this:

dev1_fd = open ("/dev/vfio0")
dev2_fd = open ("/dev/vfio1")
dev2_fd.inherit_iommu(dev1_fd)

error = close(dev1_fd)

There are other gross cases as well.

> 
> The binding means under the hood the iommus get shared, with the
> lifetime being that of the "owning" group.

So what happens in the close() above?  EINUSE?  Reset all children?  Still
seems less clean than having an explicit iommu fd.  Without some benefit I'm
not sure why we'd want to change this API.

If we in singleton-group land were building our own "groups" which were sets
of devices sharing the IOMMU domains we wanted, I suppose we could do away
with uiommu fds, but it sounds like the current proposal would create 20
singleton groups (x86 iommu w/o PCI bridges => all devices are partitionable
endpoints).  Asking me to ioctl(inherit) them together into a blob sounds
worse than the current explicit uiommu API.

Thanks,
Aaron

> 
> Another option is to make that static configuration APIs via special
> ioctls (or even netlink if you really like it), to change the grouping
> on architectures that allow it.
> 
> Cheers.
> Ben.
> 
>> 
>> -Aaron
>> 
>>> As necessary in the future, we can
>>> define a more high performance dma mapping interface for streaming dma
>>> via the group fd.  I expect we'll also include architecture specific
>>> group ioctls to describe features and capabilities of the iommu.  The
>>> group fd will need to prevent concurrent open()s to maintain a 1:1 group
>>> to userspace process ownership model.
>>> 
>>> Also on the table is supporting non-PCI devices with vfio.  To do this,
>>> we need to generalize the read/write/mmap and irq eventfd interfaces.
>>> We could keep the same model of segmenting the device fd address space,
>>> perhaps adding ioctls to define the segment offset bit position or we
>>> could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
>>> VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
>>> suffering some degree of fd bloat (group fd, device fd(s), interrupt
>>> event fd(s), per resource fd, etc).  For interrupts we can overload
>>> VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq (do non-PCI
>>> devices support MSI?).
>>> 
>>> For qemu, these changes imply we'd only support a model where we have a
>>> 1:1 group to iommu domain.  The current vfio driver could probably
>>> become vfio-pci as we might end up with more target specific vfio
>>> drivers for non-pci.  PCI should be able to maintain a simple -device
>>> vfio-pci,host=bb:dd.f to enable hotplug of individual devices.  We'll
>>> need to come up with extra options when we need to expose groups to
>>> guest for pvdma.
>>> 
>>> Hope that captures it, feel free to jump in with corrections and
>>> suggestions.  Thanks,
>>> 
>>> Alex
>>> 
> 
> 

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-22 21:03 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
	Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev,
	benve@cisco.com
In-Reply-To: <20110822172508.GJ2079@amd.com>


> I am in favour of /dev/vfio/$GROUP. If multiple devices should be
> assigned to a guest, there can also be an ioctl to bind a group to an
> address-space of another group (certainly needs some care to not allow
> that both groups belong to different processes).
> 
> Btw, a problem we havn't talked about yet entirely is
> driver-deassignment. User space can decide to de-assign the device from
> vfio while a fd is open on it. With PCI there is no way to let this fail
> (the .release function returns void last time i checked). Is this a
> problem, and yes, how we handle that?

We can treat it as a hard unplug (like a cardbus gone away).

IE. Dispose of the direct mappings (switch to MMIO emulation) and return
all ff's from reads (& ignore writes).

Then send an unplug event via whatever mechanism the platform provides
(ACPI hotplug controller on x86 for example, we haven't quite sorted out
what to do on power for hotplug yet).

Cheers,
Ben.

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-22 21:01 UTC (permalink / raw)
  To: Alex Williamson
  Cc: aafabbri, Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, David Gibson, chrisw,
	iommu, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <1314027950.6866.242.camel@x201.home>

On Mon, 2011-08-22 at 09:45 -0600, Alex Williamson wrote:

> Yes, that's the idea.  An open question I have towards the configuration
> side is whether we might add iommu driver specific options to the
> groups.  For instance on x86 where we typically have B:D.F granularity,
> should we have an option not to trust multi-function devices and use a
> B:D granularity for grouping?

Or even B or range of busses... if you want to enforce strict isolation
you really can't trust anything below a bus level :-)

> Right, we can also combine models.  Binding a device to vfio
> creates /dev/vfio$GROUP, which only allows a subset of ioctls and no
> device access until all the group devices are also bound.  I think
> the /dev/vfio/$GROUP might help provide an enumeration interface as well
> though, which could be useful.

Could be tho in what form ? returning sysfs pathes ?

> 1:1 group<->process is probably too strong.  Not allowing concurrent
> open()s on the group file enforces a single userspace entity is
> responsible for that group.  Device fds can be passed to other
> processes, but only retrieved via the group fd.  I suppose we could even
> branch off the dma interface into a different fd, but it seems like we
> would logically want to serialize dma mappings at each iommu group
> anyway.  I'm open to alternatives, this just seemed an easy way to do
> it.  Restricting on UID implies that we require isolated qemu instances
> to run as different UIDs.  I know that's a goal, but I don't know if we
> want to make it an assumption in the group security model.

1:1 process has the advantage of linking to an -mm which makes the whole
mmu notifier business doable. How do you want to track down mappings and
do the second level translation in the case of explicit map/unmap (like
on power) if you are not tied to an mm_struct ?

> Yes.  I'm not sure there's a good ROI to prioritize that model.  We have
> to assume >1 device per guest is a typical model and that the iotlb is
> large enough that we might improve thrashing to see both a resource and
> performance benefit from it.  I'm open to suggestions for how we could
> include it though.

Sharing may or may not be possible depending on setups so yes, it's a
bit tricky.

My preference is to have a static interface (and that's actually where
your pet netlink might make some sense :-) to create "synthetic" groups
made of other groups if the arch allows it. But that might not be the
best approach. In another email I also proposed an option for a group to
"capture" another one...

> > If that's
> > not what you're saying, how would the domains - now made up of a
> > user's selection of groups, rather than individual devices - be
> > configured?
> > 
> > > Hope that captures it, feel free to jump in with corrections and
> > > suggestions.  Thanks,
> > 

Another aspect I don't see discussed is how we represent these things to
the guest.

On Power for example, I have a requirement that a given iommu domain is
represented by a single dma window property in the device-tree. What
that means is that that property needs to be either in the node of the
device itself if there's only one device in the group or in a parent
node (ie a bridge or host bridge) if there are multiple devices.

Now I do -not- want to go down the path of simulating P2P bridges,
besides we'll quickly run out of bus numbers if we go there.

For us the most simple and logical approach (which is also what pHyp
uses and what Linux handles well) is really to expose a given PCI host
bridge per group to the guest. Believe it or not, it makes things
easier :-)

Cheers,
Ben.

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-22 20:53 UTC (permalink / raw)
  To: Avi Kivity
  Cc: chrisw, Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, iommu, aafabbri,
	Alex Williamson, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <4E51F782.7060005@redhat.com>

On Mon, 2011-08-22 at 09:30 +0300, Avi Kivity wrote:
> On 08/20/2011 07:51 PM, Alex Williamson wrote:
> > We need to address both the description and enforcement of device
> > groups.  Groups are formed any time the iommu does not have resolution
> > between a set of devices.  On x86, this typically happens when a
> > PCI-to-PCI bridge exists between the set of devices and the iommu.  For
> > Power, partitionable endpoints define a group.  Grouping information
> > needs to be exposed for both userspace and kernel internal usage.  This
> > will be a sysfs attribute setup by the iommu drivers.  Perhaps:
> >
> > # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> > 42
> >
> 
> $ readlink /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> ../../../path/to/device/which/represents/the/resource/constraint
> 
> (the pci-to-pci bridge on x86, or whatever node represents partitionable 
> endpoints on power)

The constraint might not necessarily be a device.

The PCI bridge is just an example. There are other possible constraints.
On POWER for example, it could be a limit in how far I can segment the
DMA address space, forcing me to arbitrarily put devices together, or it
could be a similar constraint related to how the MMIO space is broken
up.

So either that remains a path in which case we do have a separate set of
sysfs nodes representing the groups themselves which may or may not
itself contain a pointer to the "constraining" device, or we just make
that an arbitrary number (in my case the PE#)

Cheers,
Ben

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-22 20:49 UTC (permalink / raw)
  To: aafabbri
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
	Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <CA780A2A.FB0B%aafabbri@cisco.com>

On Mon, 2011-08-22 at 13:29 -0700, aafabbri wrote:

> > Each device fd would then support a
> > similar set of ioctls and mapping (mmio/pio/config) interface as current
> > vfio, except for the obvious domain and dma ioctls superseded by the
> > group fd.
> > 
> > Another valid model might be that /dev/vfio/$GROUP is created for all
> > groups when the vfio module is loaded.  The group fd would allow open()
> > and some set of iommu querying and device enumeration ioctls, but would
> > error on dma mapping and retrieving device fds until all of the group
> > devices are bound to the vfio driver.
> > 
> > In either case, the uiommu interface is removed entirely since dma
> > mapping is done via the group fd.
> 
> The loss in generality is unfortunate. I'd like to be able to support
> arbitrary iommu domain <-> device assignment.  One way to do this would be
> to keep uiommu, but to return an error if someone tries to assign more than
> one uiommu context to devices in the same group.

I wouldn't use uiommu for that. If the HW or underlying kernel drivers
support it, what I'd suggest is that you have an (optional) ioctl to
bind two groups (you have to have both opened already) or for one group
to "capture" another one.

The binding means under the hood the iommus get shared, with the
lifetime being that of the "owning" group.

Another option is to make that static configuration APIs via special
ioctls (or even netlink if you really like it), to change the grouping
on architectures that allow it.

Cheers.
Ben.

> 
> -Aaron
> 
> > As necessary in the future, we can
> > define a more high performance dma mapping interface for streaming dma
> > via the group fd.  I expect we'll also include architecture specific
> > group ioctls to describe features and capabilities of the iommu.  The
> > group fd will need to prevent concurrent open()s to maintain a 1:1 group
> > to userspace process ownership model.
> > 
> > Also on the table is supporting non-PCI devices with vfio.  To do this,
> > we need to generalize the read/write/mmap and irq eventfd interfaces.
> > We could keep the same model of segmenting the device fd address space,
> > perhaps adding ioctls to define the segment offset bit position or we
> > could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
> > VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
> > suffering some degree of fd bloat (group fd, device fd(s), interrupt
> > event fd(s), per resource fd, etc).  For interrupts we can overload
> > VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq (do non-PCI
> > devices support MSI?).
> > 
> > For qemu, these changes imply we'd only support a model where we have a
> > 1:1 group to iommu domain.  The current vfio driver could probably
> > become vfio-pci as we might end up with more target specific vfio
> > drivers for non-pci.  PCI should be able to maintain a simple -device
> > vfio-pci,host=bb:dd.f to enable hotplug of individual devices.  We'll
> > need to come up with extra options when we need to expose groups to
> > guest for pvdma.
> > 
> > Hope that captures it, feel free to jump in with corrections and
> > suggestions.  Thanks,
> > 
> > Alex
> > 

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: aafabbri @ 2011-08-22 20:29 UTC (permalink / raw)
  To: Alex Williamson, Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, chrisw, iommu, Avi Kivity,
	Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <1313859105.6866.192.camel@x201.home>




On 8/20/11 9:51 AM, "Alex Williamson" <alex.williamson@redhat.com> wrote:

> We had an extremely productive VFIO BoF on Monday.  Here's my attempt to
> capture the plan that I think we agreed to:
> 
> We need to address both the description and enforcement of device
> groups.  Groups are formed any time the iommu does not have resolution
> between a set of devices.  On x86, this typically happens when a
> PCI-to-PCI bridge exists between the set of devices and the iommu.  For
> Power, partitionable endpoints define a group.  Grouping information
> needs to be exposed for both userspace and kernel internal usage.  This
> will be a sysfs attribute setup by the iommu drivers.  Perhaps:
> 
> # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> 42
> 
> (I use a PCI example here, but attribute should not be PCI specific)
> 
> From there we have a few options.  In the BoF we discussed a model where
> binding a device to vfio creates a /dev/vfio$GROUP character device
> file.  This "group" fd provides provides dma mapping ioctls as well as
> ioctls to enumerate and return a "device" fd for each attached member of
> the group (similar to KVM_CREATE_VCPU).  We enforce grouping by
> returning an error on open() of the group fd if there are members of the
> group not bound to the vfio driver.

Sounds reasonable.

> Each device fd would then support a
> similar set of ioctls and mapping (mmio/pio/config) interface as current
> vfio, except for the obvious domain and dma ioctls superseded by the
> group fd.
> 
> Another valid model might be that /dev/vfio/$GROUP is created for all
> groups when the vfio module is loaded.  The group fd would allow open()
> and some set of iommu querying and device enumeration ioctls, but would
> error on dma mapping and retrieving device fds until all of the group
> devices are bound to the vfio driver.
> 
> In either case, the uiommu interface is removed entirely since dma
> mapping is done via the group fd.

The loss in generality is unfortunate. I'd like to be able to support
arbitrary iommu domain <-> device assignment.  One way to do this would be
to keep uiommu, but to return an error if someone tries to assign more than
one uiommu context to devices in the same group.


-Aaron

> As necessary in the future, we can
> define a more high performance dma mapping interface for streaming dma
> via the group fd.  I expect we'll also include architecture specific
> group ioctls to describe features and capabilities of the iommu.  The
> group fd will need to prevent concurrent open()s to maintain a 1:1 group
> to userspace process ownership model.
> 
> Also on the table is supporting non-PCI devices with vfio.  To do this,
> we need to generalize the read/write/mmap and irq eventfd interfaces.
> We could keep the same model of segmenting the device fd address space,
> perhaps adding ioctls to define the segment offset bit position or we
> could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
> VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
> suffering some degree of fd bloat (group fd, device fd(s), interrupt
> event fd(s), per resource fd, etc).  For interrupts we can overload
> VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq (do non-PCI
> devices support MSI?).
> 
> For qemu, these changes imply we'd only support a model where we have a
> 1:1 group to iommu domain.  The current vfio driver could probably
> become vfio-pci as we might end up with more target specific vfio
> drivers for non-pci.  PCI should be able to maintain a simple -device
> vfio-pci,host=bb:dd.f to enable hotplug of individual devices.  We'll
> need to come up with extra options when we need to expose groups to
> guest for pvdma.
> 
> Hope that captures it, feel free to jump in with corrections and
> suggestions.  Thanks,
> 
> Alex
> 

^ permalink raw reply

* Re: [PATCH] [v2] sound/soc/fsl/fsl_dma.c: add missing of_node_put
From: Liam Girdwood @ 2011-08-22 20:05 UTC (permalink / raw)
  To: Timur Tabi
  Cc: alsa-devel@alsa-project.org, tiwai@suse.de,
	devicetree-discuss@lists.ozlabs.org,
	broonie@opensource.wolfsonmicro.com,
	kernel-janitors@vger.kernel.org, linux-kernel@vger.kernel.org,
	perex@perex.cz, julia@diku.dk, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1314022961-27513-1-git-send-email-timur@freescale.com>

On 22/08/11 15:22, Timur Tabi wrote:
> of_parse_phandle increments the reference count of np, so this should be
> decremented before trying the next possibility.
> 
> Since we don't actually use np, we can decrement the reference count
> immediately.
> 
> Reported-by: Julia Lawall <julia@diku.dk>
> Signed-off-by: Timur Tabi <timur@freescale.com>

Acked-by: Liam Girdwood <lrg@ti.com>

> ---
>  sound/soc/fsl/fsl_dma.c |    2 ++
>  1 files changed, 2 insertions(+), 0 deletions(-)
> 
> diff --git a/sound/soc/fsl/fsl_dma.c b/sound/soc/fsl/fsl_dma.c
> index 6680c0b..b300f4b 100644
> --- a/sound/soc/fsl/fsl_dma.c
> +++ b/sound/soc/fsl/fsl_dma.c
> @@ -877,10 +877,12 @@ static struct device_node *find_ssi_node(struct device_node *dma_channel_np)
>  		 * assume that device_node pointers are a valid comparison.
>  		 */
>  		np = of_parse_phandle(ssi_np, "fsl,playback-dma", 0);
> +		of_node_put(np);
>  		if (np == dma_channel_np)
>  			return ssi_np;
>  
>  		np = of_parse_phandle(ssi_np, "fsl,capture-dma", 0);
> +		of_node_put(np);
>  		if (np == dma_channel_np)
>  			return ssi_np;
>  	}

^ permalink raw reply

* Re: [PATCH] RapidIO: Fix use of non-compatible registers
From: Andrew Morton @ 2011-08-22 19:28 UTC (permalink / raw)
  To: Alexandre Bounine
  Cc: Chul Kim, linux-kernel, Thomas Moll, linuxppc-dev, stable
In-Reply-To: <1311703646-3453-1-git-send-email-alexandre.bounine@idt.com>

On Tue, 26 Jul 2011 14:07:26 -0400
Alexandre Bounine <alexandre.bounine@idt.com> wrote:

> Replace/remove use of RIO v.1.2 registers/bits that are not forward-compatible
> with newer versions of RapidIO specification.
> 
> RapidIO specification v. 1.3 removed Write Port CSR, Doorbell CSR,
> Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.
> 
> Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
> Cc: Kumar Gala <galak@kernel.crashing.org>
> Cc: Matt Porter <mporter@kernel.crashing.org>
> Cc: Li Yang <leoli@freescale.com>
> Cc: Thomas Moll <thomas.moll@sysgo.com>
> Cc: Chul Kim <chul.kim@idt.com>
> Cc: <stable@kernel.org>

You did a cc:stable but provided no reason (that I can understand) for
backporting the patch.  Please explain why the problem is sufficiently
serious to warrant this action.

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Alex Williamson @ 2011-08-22 19:17 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, chrisw, iommu, Avi Kivity, Anthony Liguori,
	linux-pci@vger.kernel.org, linuxppc-dev, benve@cisco.com
In-Reply-To: <20110822172508.GJ2079@amd.com>

On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote:
> On Sat, Aug 20, 2011 at 12:51:39PM -0400, Alex Williamson wrote:
> > We had an extremely productive VFIO BoF on Monday.  Here's my attempt to
> > capture the plan that I think we agreed to:
> > 
> > We need to address both the description and enforcement of device
> > groups.  Groups are formed any time the iommu does not have resolution
> > between a set of devices.  On x86, this typically happens when a
> > PCI-to-PCI bridge exists between the set of devices and the iommu.  For
> > Power, partitionable endpoints define a group.  Grouping information
> > needs to be exposed for both userspace and kernel internal usage.  This
> > will be a sysfs attribute setup by the iommu drivers.  Perhaps:
> > 
> > # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> > 42
> 
> Right, that is mainly for libvirt to provide that information to the
> user in a meaningful way. So userspace is aware that other devices might
> not work anymore when it assigns one to a guest.
> 
> > 
> > (I use a PCI example here, but attribute should not be PCI specific)
> > 
> > From there we have a few options.  In the BoF we discussed a model where
> > binding a device to vfio creates a /dev/vfio$GROUP character device
> > file.  This "group" fd provides provides dma mapping ioctls as well as
> > ioctls to enumerate and return a "device" fd for each attached member of
> > the group (similar to KVM_CREATE_VCPU).  We enforce grouping by
> > returning an error on open() of the group fd if there are members of the
> > group not bound to the vfio driver.  Each device fd would then support a
> > similar set of ioctls and mapping (mmio/pio/config) interface as current
> > vfio, except for the obvious domain and dma ioctls superseded by the
> > group fd.
> > 
> > Another valid model might be that /dev/vfio/$GROUP is created for all
> > groups when the vfio module is loaded.  The group fd would allow open()
> > and some set of iommu querying and device enumeration ioctls, but would
> > error on dma mapping and retrieving device fds until all of the group
> > devices are bound to the vfio driver.
> 
> I am in favour of /dev/vfio/$GROUP. If multiple devices should be
> assigned to a guest, there can also be an ioctl to bind a group to an
> address-space of another group (certainly needs some care to not allow
> that both groups belong to different processes).

That's an interesting idea.  Maybe an interface similar to the current
uiommu interface, where you open() the 2nd group fd and pass the fd via
ioctl to the primary group.  IOMMUs that don't support this would fail
the attach device callback, which would fail the ioctl to bind them.  It
will need to be designed so any group can be removed from the super-set
and the remaining group(s) still works.  This feels like something that
can be added after we get an initial implementation.

> Btw, a problem we havn't talked about yet entirely is
> driver-deassignment. User space can decide to de-assign the device from
> vfio while a fd is open on it. With PCI there is no way to let this fail
> (the .release function returns void last time i checked). Is this a
> problem, and yes, how we handle that?

The current vfio has the same problem, we can't unbind a device from
vfio while it's attached to a guest.  I think we'd use the same solution
too; send out a netlink packet for a device removal and have the .remove
call sleep on a wait_event(, refcnt == 0).  We could also set a timeout
and SIGBUS the PIDs holding the device if they don't return it
willingly.  Thanks,

Alex

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Joerg Roedel @ 2011-08-22 17:25 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, chrisw, iommu, Avi Kivity, Anthony Liguori,
	linux-pci@vger.kernel.org, linuxppc-dev, benve@cisco.com
In-Reply-To: <1313859105.6866.192.camel@x201.home>

On Sat, Aug 20, 2011 at 12:51:39PM -0400, Alex Williamson wrote:
> We had an extremely productive VFIO BoF on Monday.  Here's my attempt to
> capture the plan that I think we agreed to:
> 
> We need to address both the description and enforcement of device
> groups.  Groups are formed any time the iommu does not have resolution
> between a set of devices.  On x86, this typically happens when a
> PCI-to-PCI bridge exists between the set of devices and the iommu.  For
> Power, partitionable endpoints define a group.  Grouping information
> needs to be exposed for both userspace and kernel internal usage.  This
> will be a sysfs attribute setup by the iommu drivers.  Perhaps:
> 
> # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> 42

Right, that is mainly for libvirt to provide that information to the
user in a meaningful way. So userspace is aware that other devices might
not work anymore when it assigns one to a guest.

> 
> (I use a PCI example here, but attribute should not be PCI specific)
> 
> From there we have a few options.  In the BoF we discussed a model where
> binding a device to vfio creates a /dev/vfio$GROUP character device
> file.  This "group" fd provides provides dma mapping ioctls as well as
> ioctls to enumerate and return a "device" fd for each attached member of
> the group (similar to KVM_CREATE_VCPU).  We enforce grouping by
> returning an error on open() of the group fd if there are members of the
> group not bound to the vfio driver.  Each device fd would then support a
> similar set of ioctls and mapping (mmio/pio/config) interface as current
> vfio, except for the obvious domain and dma ioctls superseded by the
> group fd.
> 
> Another valid model might be that /dev/vfio/$GROUP is created for all
> groups when the vfio module is loaded.  The group fd would allow open()
> and some set of iommu querying and device enumeration ioctls, but would
> error on dma mapping and retrieving device fds until all of the group
> devices are bound to the vfio driver.

I am in favour of /dev/vfio/$GROUP. If multiple devices should be
assigned to a guest, there can also be an ioctl to bind a group to an
address-space of another group (certainly needs some care to not allow
that both groups belong to different processes).

Btw, a problem we havn't talked about yet entirely is
driver-deassignment. User space can decide to de-assign the device from
vfio while a fd is open on it. With PCI there is no way to let this fail
(the .release function returns void last time i checked). Is this a
problem, and yes, how we handle that?


	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Matthieu CASTET @ 2011-08-22 17:05 UTC (permalink / raw)
  To: Scott Wood
  Cc: Artem Bityutskiy, LiuShuo, linuxppc-dev@ozlabs.org,
	linux-mtd@lists.infradead.org, Ivan Djelic, dwmw2@infradead.org
In-Reply-To: <4E52819C.8080204@freescale.com>

Scott Wood a écrit :
> On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
>> Scott Wood a écrit :
>>> To eliminate it we'd need to do an extra data transfer without reissuing
>>> the command, which Shuo was unable to get to work.
>>>
>> That's weird because our controller seems quite flexible [1].
>>
>> Something like that should work ?
>>
>>             out_be32(&lbc->fir,
>>                      (FIR_OP_CM2 << FIR_OP0_SHIFT) |
>>                      (FIR_OP_CA  << FIR_OP1_SHIFT) |
>>                      (FIR_OP_PA  << FIR_OP2_SHIFT) |
>>                      (FIR_OP_WB  << FIR_OP3_SHIFT));
>> refill FCM buffer with next 2k data
>>
>>             out_be32(&lbc->fir,
>>                      (FIR_OP_WB  << FIR_OP3_SHIFT) |
>>                      (FIR_OP_CM3 << FIR_OP4_SHIFT) |
>>                      (FIR_OP_CW1 << FIR_OP5_SHIFT) |
>>                      (FIR_OP_RS  << FIR_OP6_SHIFT));
> 
> Something like that is what I originally suggested, but Shuo said it
> didn't work (even in theory, it requires a CE-don't-care NAND chip,
> since bus atomicity is broken).
Are there 4K chip that are not CE-don't-care ?

Also I think it depends how the bus are connected  (shared with other device)
and the controller.

Matthieu

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Scott Wood @ 2011-08-22 16:19 UTC (permalink / raw)
  To: Matthieu CASTET
  Cc: Artem Bityutskiy, LiuShuo, linuxppc-dev@ozlabs.org,
	linux-mtd@lists.infradead.org, Ivan Djelic, dwmw2@infradead.org
In-Reply-To: <4E528036.5070801@parrot.com>

On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
> Scott Wood a =C3=A9crit :
>> To eliminate it we'd need to do an extra data transfer without reissui=
ng
>> the command, which Shuo was unable to get to work.
>>
> That's weird because our controller seems quite flexible [1].
>=20
> Something like that should work ?
>=20
>             out_be32(&lbc->fir,
>                      (FIR_OP_CM2 << FIR_OP0_SHIFT) |
>                      (FIR_OP_CA  << FIR_OP1_SHIFT) |
>                      (FIR_OP_PA  << FIR_OP2_SHIFT) |
>                      (FIR_OP_WB  << FIR_OP3_SHIFT));
> refill FCM buffer with next 2k data
>=20
>             out_be32(&lbc->fir,
>                      (FIR_OP_WB  << FIR_OP3_SHIFT) |
>                      (FIR_OP_CM3 << FIR_OP4_SHIFT) |
>                      (FIR_OP_CW1 << FIR_OP5_SHIFT) |
>                      (FIR_OP_RS  << FIR_OP6_SHIFT));

Something like that is what I originally suggested, but Shuo said it
didn't work (even in theory, it requires a CE-don't-care NAND chip,
since bus atomicity is broken).

Shuo, what specifically did you try, and what did you see happen?

-Scott

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Matthieu CASTET @ 2011-08-22 16:13 UTC (permalink / raw)
  To: Scott Wood
  Cc: Artem Bityutskiy, LiuShuo, linuxppc-dev@ozlabs.org,
	linux-mtd@lists.infradead.org, Ivan Djelic, dwmw2@infradead.org
In-Reply-To: <4E527E0F.1010500@freescale.com>

Scott Wood a écrit :
> On 08/22/2011 10:25 AM, Ivan Djelic wrote:
>> Did you take into account the fact that because MTD thinks this a 2K chip,
>> you will have to wait twice for the nand busy read time (typically 25 us) per
>> each 4K read. In other words, to read 4 kBytes you will do:
>>
>> 1. send read0 (00), send address, send read1 (30)
>> 2. wait tRB
>> 3. transfer 2 kBytes
>> 4. send read0 (00), send address, send read1 (30)
>> 5. wait tRB
>> 6. transfer 2 kBytes
>>
>> Same problem for writes (but rather 100 us instead of 25 us).
>>
>> How does that compare with hw ecc gain in terms of performance ?
> 
> We'd have the double-delay with the sw ecc plus buffering approach as well.
> 
> To eliminate it we'd need to do an extra data transfer without reissuing
> the command, which Shuo was unable to get to work.
> 
That's weird because our controller seems quite flexible [1].

Something like that should work ?

            out_be32(&lbc->fir,
                     (FIR_OP_CM2 << FIR_OP0_SHIFT) |
                     (FIR_OP_CA  << FIR_OP1_SHIFT) |
                     (FIR_OP_PA  << FIR_OP2_SHIFT) |
                     (FIR_OP_WB  << FIR_OP3_SHIFT));
refill FCM buffer with next 2k data

            out_be32(&lbc->fir,
                     (FIR_OP_WB  << FIR_OP3_SHIFT) |
                     (FIR_OP_CM3 << FIR_OP4_SHIFT) |
                     (FIR_OP_CW1 << FIR_OP5_SHIFT) |
                     (FIR_OP_RS  << FIR_OP6_SHIFT));



[1]
    __be32 fir;             /**< Flash Instruction Register */
#define FIR_OP0      0xF0000000
#define FIR_OP0_SHIFT        28
#define FIR_OP1      0x0F000000
#define FIR_OP1_SHIFT        24
#define FIR_OP2      0x00F00000
#define FIR_OP2_SHIFT        20
#define FIR_OP3      0x000F0000
#define FIR_OP3_SHIFT        16
#define FIR_OP4      0x0000F000
#define FIR_OP4_SHIFT        12
#define FIR_OP5      0x00000F00
#define FIR_OP5_SHIFT         8
#define FIR_OP6      0x000000F0
#define FIR_OP6_SHIFT         4
#define FIR_OP7      0x0000000F
#define FIR_OP7_SHIFT         0
#define FIR_OP_NOP   0x0    /* No operation and end of sequence */
#define FIR_OP_CA    0x1        /* Issue current column address */
#define FIR_OP_PA    0x2        /* Issue current block+page address */
#define FIR_OP_UA    0x3        /* Issue user defined address */
#define FIR_OP_CM0   0x4        /* Issue command from FCR[CMD0] */
#define FIR_OP_CM1   0x5        /* Issue command from FCR[CMD1] */
#define FIR_OP_CM2   0x6        /* Issue command from FCR[CMD2] */
#define FIR_OP_CM3   0x7        /* Issue command from FCR[CMD3] */
#define FIR_OP_WB    0x8        /* Write FBCR bytes from FCM buffer */
#define FIR_OP_WS    0x9        /* Write 1 or 2 bytes from MDR[AS] */
#define FIR_OP_RB    0xA        /* Read FBCR bytes to FCM buffer */
#define FIR_OP_RS    0xB        /* Read 1 or 2 bytes to MDR[AS] */
#define FIR_OP_CW0   0xC        /* Wait then issue FCR[CMD0] */
#define FIR_OP_CW1   0xD        /* Wait then issue FCR[CMD1] */
#define FIR_OP_RBW   0xE        /* Wait then read FBCR bytes */
#define FIR_OP_RSW   0xE        /* Wait then read 1 or 2 bytes */

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Scott Wood @ 2011-08-22 16:04 UTC (permalink / raw)
  To: Ivan Djelic
  Cc: Artem Bityutskiy, LiuShuo, Matthieu Castet,
	linuxppc-dev@ozlabs.org, linux-mtd@lists.infradead.org,
	dwmw2@infradead.org
In-Reply-To: <20110822152530.GA16794@parrot.com>

On 08/22/2011 10:25 AM, Ivan Djelic wrote:
> Did you take into account the fact that because MTD thinks this a 2K chip,
> you will have to wait twice for the nand busy read time (typically 25 us) per
> each 4K read. In other words, to read 4 kBytes you will do:
> 
> 1. send read0 (00), send address, send read1 (30)
> 2. wait tRB
> 3. transfer 2 kBytes
> 4. send read0 (00), send address, send read1 (30)
> 5. wait tRB
> 6. transfer 2 kBytes
> 
> Same problem for writes (but rather 100 us instead of 25 us).
> 
> How does that compare with hw ecc gain in terms of performance ?

We'd have the double-delay with the sw ecc plus buffering approach as well.

To eliminate it we'd need to do an extra data transfer without reissuing
the command, which Shuo was unable to get to work.

And it's not worse than having an actual 2K chip. :-)

-Scott

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Scott Wood @ 2011-08-22 15:58 UTC (permalink / raw)
  To: dedekind1
  Cc: linuxppc-dev@ozlabs.org, linux-mtd@lists.infradead.org, LiuShuo,
	dwmw2@infradead.org, Matthieu CASTET
In-Reply-To: <1314010719.2644.114.camel@sauron>

On 08/22/2011 05:58 AM, Artem Bityutskiy wrote:
> On Fri, 2011-08-19 at 13:10 -0500, Scott Wood wrote:
>> On 08/19/2011 03:57 AM, Matthieu CASTET wrote:
>>> How the bad block marker are handled with this remapping ?
>>
>> It has to be migrated prior to first use (this needs to be documented,
>> and ideally a U-Boot command provided do do this), or else special
>> handling would be needed when building the BBT.  The only way around
>> this would be to do ECC in software, and do the buffering needed to let
>> MTD treat it as a 4K chip.
> 
> It really feels like a special hack which would better not go to
> mainline - am I the only one with such feeling? If yes, probably I am
> wrong...

While the implementation is (of necessity) a hack, the feature is
something that multiple people have been asking for (it's not a special
case for a specific user).  They say 2K chips are getting more difficult
to obtain.  It doesn't change anything for people using 512/2K chips,
and (in its current form) doesn't introduce significant complexity to
the driver.  I'm not sure how maintaining it out of tree would be a
better situation for anyone.

-Scott

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Alex Williamson @ 2011-08-22 15:45 UTC (permalink / raw)
  To: David Gibson
  Cc: aafabbri, Alexey Kardashevskiy, kvm, Paul Mackerras, qemu-devel,
	chrisw, iommu, Avi Kivity, Anthony Liguori,
	linux-pci@vger.kernel.org, linuxppc-dev, benve
In-Reply-To: <20110822055509.GI30097@yookeroo.fritz.box>

On Mon, 2011-08-22 at 15:55 +1000, David Gibson wrote:
> On Sat, Aug 20, 2011 at 09:51:39AM -0700, Alex Williamson wrote:
> > We had an extremely productive VFIO BoF on Monday.  Here's my attempt to
> > capture the plan that I think we agreed to:
> > 
> > We need to address both the description and enforcement of device
> > groups.  Groups are formed any time the iommu does not have resolution
> > between a set of devices.  On x86, this typically happens when a
> > PCI-to-PCI bridge exists between the set of devices and the iommu.  For
> > Power, partitionable endpoints define a group.  Grouping information
> > needs to be exposed for both userspace and kernel internal usage.  This
> > will be a sysfs attribute setup by the iommu drivers.  Perhaps:
> > 
> > # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> > 42
> > 
> > (I use a PCI example here, but attribute should not be PCI specific)
> 
> Ok.  Am I correct in thinking these group IDs are representing the
> minimum granularity, and are therefore always static, defined only by
> the connected hardware, not by configuration?

Yes, that's the idea.  An open question I have towards the configuration
side is whether we might add iommu driver specific options to the
groups.  For instance on x86 where we typically have B:D.F granularity,
should we have an option not to trust multi-function devices and use a
B:D granularity for grouping?

> > >From there we have a few options.  In the BoF we discussed a model where
> > binding a device to vfio creates a /dev/vfio$GROUP character device
> > file.  This "group" fd provides provides dma mapping ioctls as well as
> > ioctls to enumerate and return a "device" fd for each attached member of
> > the group (similar to KVM_CREATE_VCPU).  We enforce grouping by
> > returning an error on open() of the group fd if there are members of the
> > group not bound to the vfio driver.  Each device fd would then support a
> > similar set of ioctls and mapping (mmio/pio/config) interface as current
> > vfio, except for the obvious domain and dma ioctls superseded by the
> > group fd.
> 
> It seems a slightly strange distinction that the group device appears
> when any device in the group is bound to vfio, but only becomes usable
> when all devices are bound.
> 
> > Another valid model might be that /dev/vfio/$GROUP is created for all
> > groups when the vfio module is loaded.  The group fd would allow open()
> > and some set of iommu querying and device enumeration ioctls, but would
> > error on dma mapping and retrieving device fds until all of the group
> > devices are bound to the vfio driver.
> 
> Which is why I marginally prefer this model, although it's not a big
> deal.

Right, we can also combine models.  Binding a device to vfio
creates /dev/vfio$GROUP, which only allows a subset of ioctls and no
device access until all the group devices are also bound.  I think
the /dev/vfio/$GROUP might help provide an enumeration interface as well
though, which could be useful.

> > In either case, the uiommu interface is removed entirely since dma
> > mapping is done via the group fd.  As necessary in the future, we can
> > define a more high performance dma mapping interface for streaming dma
> > via the group fd.  I expect we'll also include architecture specific
> > group ioctls to describe features and capabilities of the iommu.  The
> > group fd will need to prevent concurrent open()s to maintain a 1:1 group
> > to userspace process ownership model.
> 
> A 1:1 group<->process correspondance seems wrong to me. But there are
> many ways you could legitimately write the userspace side of the code,
> many of them involving some sort of concurrency.  Implementing that
> concurrency as multiple processes (using explicit shared memory and/or
> other IPC mechanisms to co-ordinate) seems a valid choice that we
> shouldn't arbitrarily prohibit.
> 
> Obviously, only one UID may be permitted to have the group open at a
> time, and I think that's enough to prevent them doing any worse than
> shooting themselves in the foot.

1:1 group<->process is probably too strong.  Not allowing concurrent
open()s on the group file enforces a single userspace entity is
responsible for that group.  Device fds can be passed to other
processes, but only retrieved via the group fd.  I suppose we could even
branch off the dma interface into a different fd, but it seems like we
would logically want to serialize dma mappings at each iommu group
anyway.  I'm open to alternatives, this just seemed an easy way to do
it.  Restricting on UID implies that we require isolated qemu instances
to run as different UIDs.  I know that's a goal, but I don't know if we
want to make it an assumption in the group security model.

> > Also on the table is supporting non-PCI devices with vfio.  To do this,
> > we need to generalize the read/write/mmap and irq eventfd interfaces.
> > We could keep the same model of segmenting the device fd address space,
> > perhaps adding ioctls to define the segment offset bit position or we
> > could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
> > VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
> > suffering some degree of fd bloat (group fd, device fd(s), interrupt
> > event fd(s), per resource fd, etc).  For interrupts we can overload
> > VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq 
> 
> Sounds reasonable.
> 
> > (do non-PCI
> > devices support MSI?).
> 
> They can.  Obviously they might not have exactly the same semantics as
> PCI MSIs, but I know we have SoC systems with (non-PCI) on-die devices
> whose interrupts are treated by the (also on-die) root interrupt
> controller in the same way as PCI MSIs.

Ok, I suppose we can define ioctls to enable these as we go.  We also
need to figure out how non-PCI resources, interrupts, and iommu mapping
restrictions are described via vfio.

> > For qemu, these changes imply we'd only support a model where we have a
> > 1:1 group to iommu domain.  The current vfio driver could probably
> > become vfio-pci as we might end up with more target specific vfio
> > drivers for non-pci.  PCI should be able to maintain a simple -device
> > vfio-pci,host=bb:dd.f to enable hotplug of individual devices.  We'll
> > need to come up with extra options when we need to expose groups to
> > guest for pvdma.
> 
> Are you saying that you'd no longer support the current x86 usage of
> putting all of one guest's devices into a single domain?

Yes.  I'm not sure there's a good ROI to prioritize that model.  We have
to assume >1 device per guest is a typical model and that the iotlb is
large enough that we might improve thrashing to see both a resource and
performance benefit from it.  I'm open to suggestions for how we could
include it though.

> If that's
> not what you're saying, how would the domains - now made up of a
> user's selection of groups, rather than individual devices - be
> configured?
> 
> > Hope that captures it, feel free to jump in with corrections and
> > suggestions.  Thanks,
> 

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Ivan Djelic @ 2011-08-22 15:25 UTC (permalink / raw)
  To: Scott Wood, LiuShuo
  Cc: Artem Bityutskiy, LiuShuo, Matthieu Castet,
	linuxppc-dev@ozlabs.org, linux-mtd@lists.infradead.org,
	dwmw2@infradead.org
In-Reply-To: <1314010719.2644.114.camel@sauron>

On Mon, Aug 22, 2011 at 11:58:33AM +0100, Artem Bityutskiy wrote:
> On Fri, 2011-08-19 at 13:10 -0500, Scott Wood wrote:
> > On 08/19/2011 03:57 AM, Matthieu CASTET wrote:
> > > LiuShuo a écrit :
> > >> ??? 2011???08???19??? 01:00, Matthieu CASTET ??????:
> > >>> b35362@freescale.com a écrit :
> > >>>> From: Liu Shuo<b35362@freescale.com>
> > >>>>
> > >>>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
> > >>>> to support the Nand flash chip whose page size is larger than 2K bytes,
> > >>>> we divide a page into multi-2K pages for MTD layer driver. In that case,
> > >>>> we force to set the page size to 2K bytes. We convert the page address of
> > >>>> MTD layer driver to a real page address in flash chips and a column index
> > >>>> in fsl_elbc driver. We can issue any column address by UA instruction of
> > >>>> elbc controller.
> > >>>>
> > >>> Why do you need to do that ?
> > >>>
> > >>> When mtd send you a 4k page, why can't you write it by 2*2k pages write ?
> > >> 1. It's easy to implement.
> > >> 2. We don't need to move the data in buffer more times, because we
> > >> want to use the HW_ECC.
> > >>
> > >> In flash chip per Page:
> > >> ----------------------------------------------------------------
> > >> | first data | first oob | second data | second oob |
> > >> ----------------------------------------------------------------
> > > How the bad block marker are handled with this remapping ?
> > 
> > It has to be migrated prior to first use (this needs to be documented,
> > and ideally a U-Boot command provided do do this), or else special
> > handling would be needed when building the BBT.  The only way around
> > this would be to do ECC in software, and do the buffering needed to let
> > MTD treat it as a 4K chip.

Did you take into account the fact that because MTD thinks this a 2K chip,
you will have to wait twice for the nand busy read time (typically 25 us) per
each 4K read. In other words, to read 4 kBytes you will do:

1. send read0 (00), send address, send read1 (30)
2. wait tRB
3. transfer 2 kBytes
4. send read0 (00), send address, send read1 (30)
5. wait tRB
6. transfer 2 kBytes

Same problem for writes (but rather 100 us instead of 25 us).

How does that compare with hw ecc gain in terms of performance ?

--
Best Regards,

Ivan

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Roedel, Joerg @ 2011-08-22 14:37 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, iommu, chrisw, Alex Williamson, Anthony Liguori,
	linux-pci@vger.kernel.org, linuxppc-dev, benve@cisco.com
In-Reply-To: <4E5256F5.3090909@redhat.com>

On Mon, Aug 22, 2011 at 09:17:41AM -0400, Avi Kivity wrote:
> On 08/22/2011 04:15 PM, Roedel, Joerg wrote:
> > On Mon, Aug 22, 2011 at 09:06:07AM -0400, Avi Kivity wrote:
> > >  On 08/22/2011 03:55 PM, Roedel, Joerg wrote:
> >
> > >  >  Well, I don't think its really meaningless, but we need some way to
> > >  >  communicate the information about device groups to userspace.
> > >
> > >  I mean the contents of the group descriptor.  There are enough 42s in
> > >  the kernel, it's better if we can replace a synthetic number with
> > >  something meaningful.
> >
> > If we only look at PCI than a Segment:Bus:Dev.Fn Number would be
> > sufficient, of course. But the idea was to make it generic enough so
> > that it works with !PCI too.
> >
> 
> We could make it an arch defined string instead of a symlink.  So it 
> doesn't return 42, rather something that can be used by the admin to 
> figure out what the problem was.

Well, ok, it would certainly differ from the in-kernel representation
then and introduce new architecture dependencies into libvirt. But if
the 'group-string' is more meaningful to users then its certainly good.
Suggestions?

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply

* [PATCH] [v2] sound/soc/fsl/fsl_dma.c: add missing of_node_put
From: Timur Tabi @ 2011-08-22 14:22 UTC (permalink / raw)
  To: kernel-janitors, lrg, broonie, perex, tiwai, grant.likely,
	alsa-devel, linuxppc-dev, linux-kernel, devicetree-discuss, julia

of_parse_phandle increments the reference count of np, so this should be
decremented before trying the next possibility.

Since we don't actually use np, we can decrement the reference count
immediately.

Reported-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Timur Tabi <timur@freescale.com>
---
 sound/soc/fsl/fsl_dma.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/sound/soc/fsl/fsl_dma.c b/sound/soc/fsl/fsl_dma.c
index 6680c0b..b300f4b 100644
--- a/sound/soc/fsl/fsl_dma.c
+++ b/sound/soc/fsl/fsl_dma.c
@@ -877,10 +877,12 @@ static struct device_node *find_ssi_node(struct device_node *dma_channel_np)
 		 * assume that device_node pointers are a valid comparison.
 		 */
 		np = of_parse_phandle(ssi_np, "fsl,playback-dma", 0);
+		of_node_put(np);
 		if (np == dma_channel_np)
 			return ssi_np;
 
 		np = of_parse_phandle(ssi_np, "fsl,capture-dma", 0);
+		of_node_put(np);
 		if (np == dma_channel_np)
 			return ssi_np;
 	}
-- 
1.7.3.4

^ permalink raw reply related

* Re: [PATCH] sound/soc/fsl/fsl_dma.c: add missing of_node_put
From: Julia Lawall @ 2011-08-22 14:06 UTC (permalink / raw)
  To: Timur Tabi
  Cc: alsa-devel, Takashi Iwai, devicetree-discuss, Mark Brown,
	kernel-janitors, linux-kernel, Jaroslav Kysela, linuxppc-dev,
	Liam Girdwood
In-Reply-To: <4E5261A5.5050608@freescale.com>

On Mon, 22 Aug 2011, Timur Tabi wrote:

> Julia Lawall wrote:
> > diff --git a/sound/soc/fsl/fsl_dma.c b/sound/soc/fsl/fsl_dma.c
> > index 0efc04a..b33271b 100644
> > --- a/sound/soc/fsl/fsl_dma.c
> > +++ b/sound/soc/fsl/fsl_dma.c
> > @@ -880,10 +880,12 @@ static struct device_node *find_ssi_node(struct device_node *dma_channel_np)
> >  		np = of_parse_phandle(ssi_np, "fsl,playback-dma", 0);
> >  		if (np == dma_channel_np)
> >  			return ssi_np;
> > +		of_node_put(np);
> >  
> >  		np = of_parse_phandle(ssi_np, "fsl,capture-dma", 0);
> >  		if (np == dma_channel_np)
> >  			return ssi_np;
> > +		of_node_put(np);
> >  	}
> 
> Thanks for catching the problem, Julia, but the fix is not quite correct.  My
> code assumes that of_parse_phandle() doesn't claim the node, but it doesn't
> actually use the node pointer, either.  All I care about is whether 'np' is
> equal to dma_channel_np.  I'm not going to use 'np'.  So I think the real fix is
> this:
> 
> @@ -880,10 +880,12 @@ static struct device_node *find_ssi_node(struct
> device_node *dma_channel_np)
>  		np = of_parse_phandle(ssi_np, "fsl,playback-dma", 0);
> +		of_node_put(np);
>  		if (np == dma_channel_np)
>  			return ssi_np;
> 
>  		np = of_parse_phandle(ssi_np, "fsl,capture-dma", 0);
> +		of_node_put(np);
>  		if (np == dma_channel_np)
>  			return ssi_np;
>  	}
> 
>  	return NULL;

OK, that looks reasonable.

julia

^ permalink raw reply

* Re: [PATCH] sound/soc/fsl/fsl_dma.c: add missing of_node_put
From: Timur Tabi @ 2011-08-22 14:03 UTC (permalink / raw)
  To: Julia Lawall
  Cc: alsa-devel, Takashi Iwai, devicetree-discuss, Mark Brown,
	kernel-janitors, linux-kernel, Jaroslav Kysela, linuxppc-dev,
	Liam Girdwood
In-Reply-To: <1313825025-17590-1-git-send-email-julia@diku.dk>

Julia Lawall wrote:
> diff --git a/sound/soc/fsl/fsl_dma.c b/sound/soc/fsl/fsl_dma.c
> index 0efc04a..b33271b 100644
> --- a/sound/soc/fsl/fsl_dma.c
> +++ b/sound/soc/fsl/fsl_dma.c
> @@ -880,10 +880,12 @@ static struct device_node *find_ssi_node(struct device_node *dma_channel_np)
>  		np = of_parse_phandle(ssi_np, "fsl,playback-dma", 0);
>  		if (np == dma_channel_np)
>  			return ssi_np;
> +		of_node_put(np);
>  
>  		np = of_parse_phandle(ssi_np, "fsl,capture-dma", 0);
>  		if (np == dma_channel_np)
>  			return ssi_np;
> +		of_node_put(np);
>  	}

Thanks for catching the problem, Julia, but the fix is not quite correct.  My
code assumes that of_parse_phandle() doesn't claim the node, but it doesn't
actually use the node pointer, either.  All I care about is whether 'np' is
equal to dma_channel_np.  I'm not going to use 'np'.  So I think the real fix is
this:

@@ -880,10 +880,12 @@ static struct device_node *find_ssi_node(struct
device_node *dma_channel_np)
 		np = of_parse_phandle(ssi_np, "fsl,playback-dma", 0);
+		of_node_put(np);
 		if (np == dma_channel_np)
 			return ssi_np;

 		np = of_parse_phandle(ssi_np, "fsl,capture-dma", 0);
+		of_node_put(np);
 		if (np == dma_channel_np)
 			return ssi_np;
 	}

 	return NULL;

-- 
Timur Tabi
Linux kernel developer at Freescale

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Avi Kivity @ 2011-08-22 13:17 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, iommu, chrisw, Alex Williamson, Anthony Liguori,
	linux-pci@vger.kernel.org, linuxppc-dev, benve@cisco.com
In-Reply-To: <20110822131508.GG2079@amd.com>

On 08/22/2011 04:15 PM, Roedel, Joerg wrote:
> On Mon, Aug 22, 2011 at 09:06:07AM -0400, Avi Kivity wrote:
> >  On 08/22/2011 03:55 PM, Roedel, Joerg wrote:
>
> >  >  Well, I don't think its really meaningless, but we need some way to
> >  >  communicate the information about device groups to userspace.
> >
> >  I mean the contents of the group descriptor.  There are enough 42s in
> >  the kernel, it's better if we can replace a synthetic number with
> >  something meaningful.
>
> If we only look at PCI than a Segment:Bus:Dev.Fn Number would be
> sufficient, of course. But the idea was to make it generic enough so
> that it works with !PCI too.
>

We could make it an arch defined string instead of a symlink.  So it 
doesn't return 42, rather something that can be used by the admin to 
figure out what the problem was.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Roedel, Joerg @ 2011-08-22 13:15 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, iommu, chrisw, Alex Williamson, Anthony Liguori,
	linux-pci@vger.kernel.org, linuxppc-dev, benve@cisco.com
In-Reply-To: <4E52543F.70104@redhat.com>

On Mon, Aug 22, 2011 at 09:06:07AM -0400, Avi Kivity wrote:
> On 08/22/2011 03:55 PM, Roedel, Joerg wrote:

> > Well, I don't think its really meaningless, but we need some way to
> > communicate the information about device groups to userspace.
> 
> I mean the contents of the group descriptor.  There are enough 42s in 
> the kernel, it's better if we can replace a synthetic number with 
> something meaningful.

If we only look at PCI than a Segment:Bus:Dev.Fn Number would be
sufficient, of course. But the idea was to make it generic enough so
that it works with !PCI too.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Avi Kivity @ 2011-08-22 13:06 UTC (permalink / raw)
  To: Roedel, Joerg
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, iommu, chrisw, Alex Williamson, Anthony Liguori,
	linux-pci@vger.kernel.org, linuxppc-dev, benve@cisco.com
In-Reply-To: <20110822125502.GF2079@amd.com>

On 08/22/2011 03:55 PM, Roedel, Joerg wrote:
> On Mon, Aug 22, 2011 at 08:42:35AM -0400, Avi Kivity wrote:
> >  On 08/22/2011 03:36 PM, Roedel, Joerg wrote:
> >  >  On the AMD IOMMU side this information is stored in the IVRS ACPI table.
> >  >  Not sure about the VT-d side, though.
> >
> >  I see.  There is no sysfs node representing it?
>
> No. It also doesn't exist as a 'struct pci_dev'. This caused problems in
> the AMD IOMMU driver in the past and I needed to fix that. There I know
> that from :)

Well, too bad.

>
> >  I'd rather not add another meaningless identifier.
>
> Well, I don't think its really meaningless, but we need some way to
> communicate the information about device groups to userspace.
>

I mean the contents of the group descriptor.  There are enough 42s in 
the kernel, it's better if we can replace a synthetic number with 
something meaningful.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox