LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: kvm PCI assignment & VFIO ramblings
From: Benjamin Herrenschmidt @ 2011-08-23  6:54 UTC (permalink / raw)
  To: aafabbri
  Cc: Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
	Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <CA7847D2.FB3A%aafabbri@cisco.com>

On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote:

> I'm not following you.
> 
> You have to enforce group/iommu domain assignment whether you have the
> existing uiommu API, or if you change it to your proposed
> ioctl(inherit_iommu) API.
> 
> The only change needed to VFIO here should be to make uiommu fd assignment
> happen on the groups instead of on device fds.  That operation fails or
> succeeds according to the group semantics (all-or-none assignment/same
> uiommu).

Ok, so I missed that part where you change uiommu to operate on group
fd's rather than device fd's, my apologies if you actually wrote that
down :-) It might be obvious ... bare with me I just flew back from the
US and I am badly jet lagged ...

So I see what you mean, however...

> I think the question is: do we force 1:1 iommu/group mapping, or do we allow
> arbitrary mapping (satisfying group constraints) as we do today.
> 
> I'm saying I'm an existing user who wants the arbitrary iommu/group mapping
> ability and definitely think the uiommu approach is cleaner than the
> ioctl(inherit_iommu) approach.  We considered that approach before but it
> seemed less clean so we went with the explicit uiommu context.

Possibly, the question that interest me the most is what interface will
KVM end up using. I'm also not terribly fan with the (perceived)
discrepancy between using uiommu to create groups but using the group fd
to actually do the mappings, at least if that is still the plan.

If the separate uiommu interface is kept, then anything that wants to be
able to benefit from the ability to put multiple devices (or existing
groups) into such a "meta group" would need to be explicitly modified to
deal with the uiommu APIs.

I tend to prefer such "meta groups" as being something you create
statically using a configuration interface, either via sysfs, netlink or
ioctl's to a "control" vfio device driven by a simple command line tool
(which can have the configuration stored in /etc and re-apply it at
boot).

That way, any program capable of exploiting VFIO "groups" will
automatically be able to exploit those "meta groups" (or groups of
groups) as well as long as they are supported on the system.

If we ever have system specific constraints as to how such groups can be
created, then it can all be handled at the level of that configuration
tool without impact on whatever programs know how to exploit them via
the VFIO interfaces.

> >  .../...
> > 
> >> If we in singleton-group land were building our own "groups" which were sets
> >> of devices sharing the IOMMU domains we wanted, I suppose we could do away
> >> with uiommu fds, but it sounds like the current proposal would create 20
> >> singleton groups (x86 iommu w/o PCI bridges => all devices are partitionable
> >> endpoints).  Asking me to ioctl(inherit) them together into a blob sounds
> >> worse than the current explicit uiommu API.
> > 
> > I'd rather have an API to create super-groups (groups of groups)
> > statically and then you can use such groups as normal groups using the
> > same interface. That create/management process could be done via a
> > simple command line utility or via sysfs banging, whatever...

Cheers,
Ben.

^ permalink raw reply

* [PATCH] SPI: fix build with CONFIG_SPI_FSL_ESPI=m
From: Jiri Slaby @ 2011-08-23  7:59 UTC (permalink / raw)
  To: grant.likely
  Cc: spi-devel-general, Jiri Slaby, linuxppc-dev, linux-kernel,
	jirislaby

When spi_fsl_espi is chosen to be built as a module, there is a build
error because we test only CONFIG_SPI_FSL_ESPI in declaration of
struct mpc8xxx_spi in drivers/spi/spi_fsl_lib.h.

We need to add a test for CONFIG_SPI_FSL_ESPI_MODULE too.

The error looks like:
drivers/spi/spi_fsl_espi.c: In function 'fsl_espi_bufs':
drivers/spi/spi_fsl_espi.c:232: error: 'struct mpc8xxx_spi' has no member named 'len'
...

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
---
 drivers/spi/spi-fsl-lib.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/spi/spi-fsl-lib.h b/drivers/spi/spi-fsl-lib.h
index cbe881b..97968de 100644
--- a/drivers/spi/spi-fsl-lib.h
+++ b/drivers/spi/spi-fsl-lib.h
@@ -28,7 +28,7 @@ struct mpc8xxx_spi {
 	/* rx & tx bufs from the spi_transfer */
 	const void *tx;
 	void *rx;
-#ifdef CONFIG_SPI_FSL_ESPI
+#if defined(CONFIG_SPI_FSL_ESPI) || defined(CONFIG_SPI_FSL_ESPI_MODULE)
 	int len;
 #endif

-- 
1.7.6

^ permalink raw reply related

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Matthieu CASTET @ 2011-08-23  8:14 UTC (permalink / raw)
  To: LiuShuo
  Cc: Li Yang-R58472, Artem Bityutskiy, linuxppc-dev@ozlabs.org,
	linux-mtd@lists.infradead.org, Scott Wood, Ivan Djelic,
	dwmw2@infradead.org
In-Reply-To: <4E5319E8.50903@freescale.com>

LiuShuo a écrit :
> 于 2011年08月23日 00:19, Scott Wood 写道:
>> On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
>>> Scott Wood a écrit :
>>>> To eliminate it we'd need to do an extra data transfer without reissuing
>>>> the command, which Shuo was unable to get to work.
>>>>
>>> That's weird because our controller seems quite flexible [1].
>>>
>>> Something like that should work ?
>>>
>>>              out_be32(&lbc->fir,
>>>                       (FIR_OP_CM2<<  FIR_OP0_SHIFT) |
>>>                       (FIR_OP_CA<<  FIR_OP1_SHIFT) |
>>>                       (FIR_OP_PA<<  FIR_OP2_SHIFT) |
>>>                       (FIR_OP_WB<<  FIR_OP3_SHIFT));
>>> refill FCM buffer with next 2k data
>>>
>>>              out_be32(&lbc->fir,
>>>                       (FIR_OP_WB<<  FIR_OP3_SHIFT) |
>>>                       (FIR_OP_CM3<<  FIR_OP4_SHIFT) |
>>>                       (FIR_OP_CW1<<  FIR_OP5_SHIFT) |
>>>                       (FIR_OP_RS<<  FIR_OP6_SHIFT));
>> Something like that is what I originally suggested, but Shuo said it
>> didn't work (even in theory, it requires a CE-don't-care NAND chip,
>> since bus atomicity is broken).
>>
>> Shuo, what specifically did you try, and what did you see happen?
>>
>> -Scott
> First, if we want to read 4K data with once command issuing, we can't 
> use HW_ECC.
Yes, but as ivan said doesn't the cost of 2 read isn't bigger than software ecc ?

> Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st 
> 2k and 2nd 2k data.
Did you understand where those 0xff comes (what's the size of them. Doesn't the
controller try to insert spare aera ?)

Could you detail the sequence you used ?

Matthieu

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: LiuShuo @ 2011-08-23  8:37 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev, dwmw2, Li Yang-R58472, linux-mtd
In-Reply-To: <4E4D3CE0.7020602@freescale.com>

=E4=BA=8E 2011=E5=B9=B408=E6=9C=8819=E6=97=A5 00:25, Scott Wood =E5=86=99=
=E9=81=93:
> On 08/17/2011 09:33 PM, b35362@freescale.com wrote:
>> From: Liu Shuo<b35362@freescale.com>
>>
>> Freescale FCM controller has a 2K size limitation of buffer RAM. In or=
der
>> to support the Nand flash chip whose page size is larger than 2K bytes=
,
>> we divide a page into multi-2K pages for MTD layer driver. In that cas=
e,
>> we force to set the page size to 2K bytes. We convert the page address=
 of
>> MTD layer driver to a real page address in flash chips and a column in=
dex
>> in fsl_elbc driver. We can issue any column address by UA instruction =
of
>> elbc controller.
>>
>> NOTE: Due to there is a limitation of 'Number of Partial Program Cycle=
s in
>> the Same Page (NOP)', the flash chip which is supported by this workar=
ound
>> have to meet below conditions.
>> 	1. page size is not greater than 4KB
>> 	2.	1) if main area and spare area have independent NOPs:
>> 			  main  area NOP    :>=3D3
>> 			  spare area NOP    :>=3D2?
> How often are the NOPs split like this?
>
>> 		2) if main area and spare area have a common NOP:
>> 			  NOP               :>=3D4
> This depends on how the flash is used.  If you treat it as a NOP1 flash
> (e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip an=
d
> NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on =
a
> real 2K chip, you'll need NOP8 for a 4K chip.
>
> The NOP restrictions should be documented in the code itself, not just
> in the git changelog.  Maybe print it to the console when this hack is
> used, along with the NOP value read from the ID.

We can't read the NOP from the ID on any chip. Some chips don't
give this infomation.(e.g. Micron MT29F4G08BAC)

So it is hard to determine whether the probe() should fail in the code.
Maybe we will always print the NOP restrictions when this hack is used,
let the customers select how to use the flash on their board.

-LiuShuo
> If it's less than 4
> for 4K or 8 for 8K, also print a message saying not to use jffs2 (does
> yaffs2 do similar things?).  If it's less than 2 for 4K or 4 for 8K, th=
e
> probe should fail.
>
> -Scott

^ permalink raw reply

* Re: linux-next: boot test failure (net tree)
From: Jeff Kirsher @ 2011-08-23  8:29 UTC (permalink / raw)
  To: David Miller
  Cc: sfr@canb.auug.org.au, mikey@neuling.org,
	linux-kbuild@vger.kernel.org, netdev@vger.kernel.org,
	linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org,
	linux-next@vger.kernel.org, paulus@samba.org, lacombar@gmail.com,
	akpm@linux-foundation.org, torvalds@linux-foundation.org
In-Reply-To: <20110822.210255.1902105215409964106.davem@davemloft.net>

[-- Attachment #1: Type: text/plain, Size: 1358 bytes --]

On Mon, 2011-08-22 at 21:02 -0700, David Miller wrote:
> From: Arnaud Lacombe <lacombar@gmail.com>
> Date: Mon, 22 Aug 2011 23:50:02 -0400
> 
> > Are you implying we need some kind of way to migrate config ?
> 
> The issue is that the dependencies for every single ethernet driver
> have changed.  Some dependencies have been dropped (f.e. NETDEV_10000
> and some have been added (f.e. ETHERNET, NET_VENDOR_****)
> 
> So right now an automated (non-prompted, default to no on all new
> options) run on an existing config results in all ethernet drivers
> getting disabled because the new dependencies don't get enabled.
> 
> This wouldn't be so bad if it was just one or two drivers, but in
> this case it's every single ethernet driver which will have and hit
> this problem.
> 

Ok, I have patch which will resolve the issue.  It is the last patch in
the series I am about to send out.  What this patch does is set the
"new" Kconfig options to Y, so that current defconfig's can build
driver's that are currently set to build.

This will fix the issue, I have confirmed this with the x86_64
defconfig.  It will be nice that eventually all configs get updated so
that not all the NET_VENDOR_* tags have to be enabled, but
understandably this is the best way to ensure that current defconfig's
will compile all expected drivers.

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: [PATCH] SPI: fix build with CONFIG_SPI_FSL_ESPI=m
From: Jiri Slaby @ 2011-08-23  8:49 UTC (permalink / raw)
  To: Jiri Slaby; +Cc: spi-devel-general, linuxppc-dev, linux-kernel
In-Reply-To: <1314086345-2818-1-git-send-email-jslaby@suse.cz>

On 08/23/2011 09:59 AM, Jiri Slaby wrote:
> When spi_fsl_espi is chosen to be built as a module, there is a build
> error because we test only CONFIG_SPI_FSL_ESPI in declaration of
> struct mpc8xxx_spi in drivers/spi/spi_fsl_lib.h.
> 
> We need to add a test for CONFIG_SPI_FSL_ESPI_MODULE too.
> 
> The error looks like:
> drivers/spi/spi_fsl_espi.c: In function 'fsl_espi_bufs':
> drivers/spi/spi_fsl_espi.c:232: error: 'struct mpc8xxx_spi' has no member named 'len'
> ...
> 
> Signed-off-by: Jiri Slaby <jslaby@suse.cz>
> ---
>  drivers/spi/spi-fsl-lib.h |    2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/drivers/spi/spi-fsl-lib.h b/drivers/spi/spi-fsl-lib.h
> index cbe881b..97968de 100644
> --- a/drivers/spi/spi-fsl-lib.h
> +++ b/drivers/spi/spi-fsl-lib.h
> @@ -28,7 +28,7 @@ struct mpc8xxx_spi {
>  	/* rx & tx bufs from the spi_transfer */
>  	const void *tx;
>  	void *rx;
> -#ifdef CONFIG_SPI_FSL_ESPI
> +#if defined(CONFIG_SPI_FSL_ESPI) || defined(CONFIG_SPI_FSL_ESPI_MODULE)
>  	int len;
>  #endif

Oh, and there are still link errors:
ERROR: "mpc8xxx_spi_tx_buf_u32" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "mpc8xxx_spi_rx_buf_u32" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "mpc8xxx_spi_tx_buf_u16" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "mpc8xxx_spi_rx_buf_u16" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "mpc8xxx_spi_tx_buf_u8" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "mpc8xxx_spi_rx_buf_u8" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "of_mpc8xxx_spi_probe" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "mpc8xxx_spi_strmode" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "mpc8xxx_spi_probe" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "mpc8xxx_spi_remove" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "to_of_pinfo" [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: "mpc8xxx_spi_tx_buf_u32" [drivers/spi/spi_fsl_espi.ko] undefined!
ERROR: "mpc8xxx_spi_rx_buf_u32" [drivers/spi/spi_fsl_espi.ko] undefined!
ERROR: "of_mpc8xxx_spi_probe" [drivers/spi/spi_fsl_espi.ko] undefined!
ERROR: "mpc8xxx_spi_probe" [drivers/spi/spi_fsl_espi.ko] undefined!
ERROR: "mpc8xxx_spi_remove" [drivers/spi/spi_fsl_espi.ko] undefined!

The functions are not exported...

Should I export all those or deny CONFIG_SPI_FSL_ESPI=m?

thanks,
-- 
js

^ permalink raw reply

* Re: [PATCH 1/2] [hw-breakpoint] Use generic hw-breakpoint interfaces for new PPC ptrace flags
From: K.Prasad @ 2011-08-23  9:25 UTC (permalink / raw)
  To: linuxppc-dev, Thiago Jung Bauermann, Edjunior Barbosa Machado
In-Reply-To: <20110823050850.GS30097@yookeroo.fritz.box>

On Tue, Aug 23, 2011 at 03:08:50PM +1000, David Gibson wrote:
> On Fri, Aug 19, 2011 at 01:21:36PM +0530, K.Prasad wrote:
> > PPC_PTRACE_GETHWDBGINFO, PPC_PTRACE_SETHWDEBUG and PPC_PTRACE_DELHWDEBUG are
> > PowerPC specific ptrace flags that use the watchpoint register. While they are
> > targeted primarily towards BookE users, user-space applications such as GDB
> > have started using them for BookS too.
> > 
> > This patch enables the use of generic hardware breakpoint interfaces for these
> > new flags. The version number of the associated data structures
> > "ppc_hw_breakpoint" and "ppc_debug_info" is incremented to denote new semantics.
> 
> So, the structure itself doesn't seem to have been extended.  I don't
> understand what the semantic difference is - your patch comment needs
> to explain this clearly.
>

We had a request to extend the structure but thought it was dangerous to
do so. For instance if the user-space used version1 of the structure,
while kernel did a copy_to_user() pertaining to version2, then we'd run
into problems. Unfortunately the ptrace flags weren't designed to accept
a version number as input from the user through the
PPC_PTRACE_GETHWDBGINFO flag (which would have solved this issue).

I'll add a comment w.r.t change in semantics - such as the ability to
accept 'range' breakpoints in BookS.
 
> > Apart from the usual benefits of using generic hw-breakpoint interfaces, these
> > changes allow debuggers (such as GDB) to use a common set of ptrace flags for
> > their watchpoint needs and allow more precise breakpoint specification (length
> > of the variable can be specified).
> 
> What is the mechanism for implementing the range breakpoint on book3s?
> 

The hw-breakpoint interface, accepts length as an argument in BookS (any
value <= 8 Bytes) and would filter out extraneous interrupts arising out
of accesses outside the range comprising <addr, addr + len> inside
hw_breakpoint_handler function.

We put that ability to use here.

> > [Edjunior: Identified an issue in the patch with the sanity check for version
> > numbers]
> > 
> > Tested-by: Edjunior Barbosa Machado <emachado@linux.vnet.ibm.com>
> > Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com>
> > ---
> >  Documentation/powerpc/ptrace.txt |   16 ++++++
> >  arch/powerpc/kernel/ptrace.c     |  104 +++++++++++++++++++++++++++++++++++---
> >  2 files changed, 112 insertions(+), 8 deletions(-)
> > 
> > diff --git a/Documentation/powerpc/ptrace.txt b/Documentation/powerpc/ptrace.txt
> > index f4a5499..97301ae 100644
> > --- a/Documentation/powerpc/ptrace.txt
> > +++ b/Documentation/powerpc/ptrace.txt
> > @@ -127,6 +127,22 @@ Some examples of using the structure to:
> >    p.addr2           = (uint64_t) end_range;
> >    p.condition_value = 0;
> >  
> > +- set a watchpoint in server processors (BookS) using version 2
> > +
> > +  p.version         = 2;
> > +  p.trigger_type    = PPC_BREAKPOINT_TRIGGER_RW;
> > +  p.addr_mode       = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE;
> > +  or
> > +  p.addr_mode       = PPC_BREAKPOINT_MODE_RANGE_EXACT;
> > +
> > +  p.condition_mode  = PPC_BREAKPOINT_CONDITION_NONE;
> > +  p.addr            = (uint64_t) begin_range;
> > +  /* For PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE addr2 needs to be specified, where
> > +   * addr2 - addr <= 8 Bytes.
> > +   */
> > +  p.addr2           = (uint64_t) end_range;
> > +  p.condition_value = 0;
> > +
> >  3. PTRACE_DELHWDEBUG
> >  
> >  Takes an integer which identifies an existing breakpoint or watchpoint
> > diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
> > index 05b7dd2..18d28b6 100644
> > --- a/arch/powerpc/kernel/ptrace.c
> > +++ b/arch/powerpc/kernel/ptrace.c
> > @@ -1339,11 +1339,17 @@ static int set_dac_range(struct task_struct *child,
> >  static long ppc_set_hwdebug(struct task_struct *child,
> >  		     struct ppc_hw_breakpoint *bp_info)
> >  {
> > +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> > +	int ret, len = 0;
> > +	struct thread_struct *thread = &(child->thread);
> > +	struct perf_event *bp;
> > +	struct perf_event_attr attr;
> > +#endif /* CONFIG_HAVE_HW_BREAKPOINT */
> 
> I'm confused.  This compiled before on book3s, and I don't see any
> changes to Makefile or Kconfig in the patch that will result in this
> code compiling  when it previously didn't   Why are these new guards
> added?
> 

The code is guarded using the CONFIG_ flags for two reasons.
a) We don't want the code to be included for BookE and other
architectures.
b) In BookS, we're now adding a new ability based on whether
CONFIG_HAVE_HW_BREAKPOINT is defined. Presently this config option is
kept on by default, however there are plans to make this a config-time
option.

> >  #ifndef CONFIG_PPC_ADV_DEBUG_REGS
> >  	unsigned long dabr;
> >  #endif
> >  
> > -	if (bp_info->version != 1)
> > +	if ((bp_info->version != 1) && (bp_info->version != 2))
> >  		return -ENOTSUPP;
> >  #ifdef CONFIG_PPC_ADV_DEBUG_REGS
> >  	/*
> > @@ -1382,13 +1388,9 @@ static long ppc_set_hwdebug(struct task_struct *child,
> >  	 */
> >  	if ((bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_RW) == 0 ||
> >  	    (bp_info->trigger_type & ~PPC_BREAKPOINT_TRIGGER_RW) != 0 ||
> > -	    bp_info->addr_mode != PPC_BREAKPOINT_MODE_EXACT ||
> >  	    bp_info->condition_mode != PPC_BREAKPOINT_CONDITION_NONE)
> >  		return -EINVAL;
> >  
> > -	if (child->thread.dabr)
> > -		return -ENOSPC;
> > -
> 
> You remove this test to see if the single watchpoint slot is already
> in use, but I don't see another test replacing it.
> 

This test is retained for !CONFIG_HAVE_HW_BREAKPOINT case. In case of
using hw-breakpoint interfaces, we have a double check through
thread->ptrace_bps[0] and using register_user_hw_breakpoint function
(which would error out if not enough free slots are available).

> >  	if ((unsigned long)bp_info->addr >= TASK_SIZE)
> >  		return -EIO;
> >  
> > @@ -1398,15 +1400,86 @@ static long ppc_set_hwdebug(struct task_struct *child,
> >  		dabr |= DABR_DATA_READ;
> >  	if (bp_info->trigger_type & PPC_BREAKPOINT_TRIGGER_WRITE)
> >  		dabr |= DABR_DATA_WRITE;
> > +#ifdef CONFIG_HAVE_HW_BREAKPOINT
> > +	if (bp_info->version == 1)
> > +		goto version_one;
> 
> There are several legitimate uses of goto in the kernel, but this is
> definitely not one of them.  You're essentially using it to put the
> old and new versions of the same function in one block.  Nasty.
> 

Maybe it's the label that's causing bother here. It might look elegant
if it was called something like exit_* or error_* :-)

The goto here helps reduce code, is similar to the error exits we use
everywhere.

> > +	if (ptrace_get_breakpoints(child) < 0)
> > +		return -ESRCH;
> >  
> > -	child->thread.dabr = dabr;
> > +	bp = thread->ptrace_bps[0];
> > +	if (!bp_info->addr) {
> > +		if (bp) {
> > +			unregister_hw_breakpoint(bp);
> > +			thread->ptrace_bps[0] = NULL;
> > +		}
> > +		ptrace_put_breakpoints(child);
> > +		return 0;
> 
> Why are you making setting a 0 watchpoint remove the existing one (I
> think that's what this does).  I thought there was an explicit del
> breakpoint operation instead.
> 

We had to define the semantics for what writing a 0 to DABR could mean,
and I think it is intuitive to consider it as deletion
request...couldn't think of a case where DABR with addr=0 and RW=1 would
be required.

> > +	}
> > +	/*
> > +	 * Check if the request is for 'range' breakpoints. We can
> > +	 * support it if range < 8 bytes.
> > +	 */
> > +	if (bp_info->addr_mode == PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE)
> > +		len = bp_info->addr2 - bp_info->addr;
> 
> So you compute the length here, but I don't see you ever test if it is
> < 8 and return an error.
> 

The hw-breakpoint interfaces would fail if the length was > 8.

> > +	else if (bp_info->addr_mode != PPC_BREAKPOINT_MODE_EXACT) {
> > +			ptrace_put_breakpoints(child);
> > +			return -EINVAL;
> > +		}
> > +	if (bp) {
> > +		attr = bp->attr;
> > +		attr.bp_addr = (unsigned long)bp_info->addr & ~HW_BREAKPOINT_ALIGN;
> > +		arch_bp_generic_fields(dabr &
> > +					(DABR_DATA_WRITE | DABR_DATA_READ),
> > +							&attr.bp_type);
> > +		attr.bp_len = len;
> > +		ret =  modify_user_hw_breakpoint(bp, &attr);
> > +		if (ret) {
> > +			ptrace_put_breakpoints(child);
> > +			return ret;
> > +		}
> > +		thread->ptrace_bps[0] = bp;
> > +		ptrace_put_breakpoints(child);
> > +		thread->dabr = dabr;
> > +		return 0;
> > +	}
> >  
> > +	/* Create a new breakpoint request if one doesn't exist already */
> > +	hw_breakpoint_init(&attr);
> > +	attr.bp_addr = (unsigned long)bp_info->addr & ~HW_BREAKPOINT_ALIGN;
> 
> You seem to be silently masking the given address, which seems
> completely wrong.
> 

We have two ways of looking at the input address.
a) Assume that the input address is not multiplexed with the read/write
bits and return -EINVAL (for not confirming to the 8-byte alignment
requirement).
b) Consider the input address to be encoded with the read/write
watchpoint type request and align the address by default. This is how
the code behaves presently for the !CONFIG_HAVE_HW_BREAKPOINT case.

I chose to go with b) and discard the last 3-bits from the address.

Thanks for the detailed review. Looking forward for your comments.

Thanks,
K.Prasad

^ permalink raw reply

* Re: [PATCH 2/2] [PowerPC Book3E] Introduce new ptrace debug feature flag
From: K.Prasad @ 2011-08-23  9:27 UTC (permalink / raw)
  To: linuxppc-dev, Thiago Jung Bauermann, Edjunior Barbosa Machado
In-Reply-To: <20110823050931.GT30097@yookeroo.fritz.box>

On Tue, Aug 23, 2011 at 03:09:31PM +1000, David Gibson wrote:
> On Fri, Aug 19, 2011 at 01:23:38PM +0530, K.Prasad wrote:
> > 
> > While PPC_PTRACE_SETHWDEBUG ptrace flag in PowerPC accepts
> > PPC_BREAKPOINT_MODE_EXACT mode of breakpoint, the same is not intimated to the
> > user-space debuggers (like GDB) who may want to use it. Hence we introduce a
> > new PPC_DEBUG_FEATURE_DATA_BP_EXACT flag which will be populated on the
> > "features" member of "struct ppc_debug_info" to advertise support for the
> > same on Book3E PowerPC processors.
> 
> I thought the idea was that the BP_EXACT mode was the default - if the
> new interface was supported at all, then BP_EXACT was always
> supported.  So, why do you need a new flag?
> 

Yes, BP_EXACT was always supported but not advertised through
PPC_PTRACE_GETHWDBGINFO. We're now doing that.

Thanks,
K.Prasad

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: LiuShuo @ 2011-08-23  9:57 UTC (permalink / raw)
  To: Matthieu CASTET
  Cc: Li Yang-R58472, Artem Bityutskiy, linuxppc-dev@ozlabs.org,
	linux-mtd@lists.infradead.org, Scott Wood, Ivan Djelic,
	dwmw2@infradead.org
In-Reply-To: <4E53614E.2070103@parrot.com>

=E4=BA=8E 2011=E5=B9=B408=E6=9C=8823=E6=97=A5 16:14, Matthieu CASTET =E5=86=
=99=E9=81=93:
> LiuShuo a =C3=A9crit :
>> =E4=BA=8E 2011=E5=B9=B408=E6=9C=8823=E6=97=A5 00:19, Scott Wood =E5=86=
=99=E9=81=93:
>>> On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
>>>> Scott Wood a =C3=A9crit :
>>>>> To eliminate it we'd need to do an extra data transfer without reis=
suing
>>>>> the command, which Shuo was unable to get to work.
>>>>>
>>>> That's weird because our controller seems quite flexible [1].
>>>>
>>>> Something like that should work ?
>>>>
>>>>               out_be32(&lbc->fir,
>>>>                        (FIR_OP_CM2<<   FIR_OP0_SHIFT) |
>>>>                        (FIR_OP_CA<<   FIR_OP1_SHIFT) |
>>>>                        (FIR_OP_PA<<   FIR_OP2_SHIFT) |
>>>>                        (FIR_OP_WB<<   FIR_OP3_SHIFT));
>>>> refill FCM buffer with next 2k data
>>>>
>>>>               out_be32(&lbc->fir,
>>>>                        (FIR_OP_WB<<   FIR_OP3_SHIFT) |
>>>>                        (FIR_OP_CM3<<   FIR_OP4_SHIFT) |
>>>>                        (FIR_OP_CW1<<   FIR_OP5_SHIFT) |
>>>>                        (FIR_OP_RS<<   FIR_OP6_SHIFT));
>>> Something like that is what I originally suggested, but Shuo said it
>>> didn't work (even in theory, it requires a CE-don't-care NAND chip,
>>> since bus atomicity is broken).
>>>
>>> Shuo, what specifically did you try, and what did you see happen?
>>>
>>> -Scott
>> First, if we want to read 4K data with once command issuing, we can't
>> use HW_ECC.
> Yes, but as ivan said doesn't the cost of 2 read isn't bigger than soft=
ware ecc ?
>
>> Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st
>> 2k and 2nd 2k data.
> Did you understand where those 0xff comes (what's the size of them. Doe=
sn't the
> controller try to insert spare aera ?)
I don't understand. I set FBCR to 2048, the controller will read the=20
main area without spare area.
But the size of them is nearly spare area size( more or less a few bytes)=
.
I can't guess the behavior of the controller then, so I select another wa=
y.

Could you try to do it and explain how those 0xff comes ?
> Could you detail the sequence you used ?
>
First half :
                   out_be32(&lbc->fbcr, 2048);
                   out_be32(&lbc->fir,
                            (FIR_OP_CM0 << FIR_OP0_SHIFT) |
                            (FIR_OP_CA << FIR_OP1_SHIFT) |
                            (FIR_OP_PA << FIR_OP2_SHIFT) |
                            (FIR_OP_CM1 << FIR_OP3_SHIFT) |
                            (FIR_OP_RBW << FIR_OP4_SHIFT));


Sencond half :
                 out_be32(&lbc->fbcr, 2048);
                 out_be32(&lbc->fir,
                            (FIR_OP_RB << FIR_OP0_SHIFT) |
                            (FIR_OP_RBW << FIR_OP1_SHIFT));


-Liu Shuo

> Matthieu
>
>

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Matthieu CASTET @ 2011-08-23 10:02 UTC (permalink / raw)
  To: LiuShuo
  Cc: Scott Wood, linuxppc-dev@ozlabs.org, dwmw2@infradead.org,
	Li Yang-R58472, linux-mtd@lists.infradead.org
In-Reply-To: <4E5366AF.7040108@freescale.com>

LiuShuo a écrit :
> 于 2011年08月19日 00:25, Scott Wood 写道:
>> On 08/17/2011 09:33 PM, b35362@freescale.com wrote:
>>> From: Liu Shuo<b35362@freescale.com>
>>>
>>> Freescale FCM controller has a 2K size limitation of buffer RAM. In order
>>> to support the Nand flash chip whose page size is larger than 2K bytes,
>>> we divide a page into multi-2K pages for MTD layer driver. In that case,
>>> we force to set the page size to 2K bytes. We convert the page address of
>>> MTD layer driver to a real page address in flash chips and a column index
>>> in fsl_elbc driver. We can issue any column address by UA instruction of
>>> elbc controller.
>>>
>>> NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
>>> the Same Page (NOP)', the flash chip which is supported by this workaround
>>> have to meet below conditions.
>>> 	1. page size is not greater than 4KB
>>> 	2.	1) if main area and spare area have independent NOPs:
>>> 			  main  area NOP    :>=3
>>> 			  spare area NOP    :>=2?
>> How often are the NOPs split like this?
>>
>>> 		2) if main area and spare area have a common NOP:
>>> 			  NOP               :>=4
>> This depends on how the flash is used.  If you treat it as a NOP1 flash
>> (e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip and
>> NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on a
>> real 2K chip, you'll need NOP8 for a 4K chip.
>>
>> The NOP restrictions should be documented in the code itself, not just
>> in the git changelog.  Maybe print it to the console when this hack is
>> used, along with the NOP value read from the ID.
> 
> We can't read the NOP from the ID on any chip. Some chips don't
> give this infomation.(e.g. Micron MT29F4G08BAC)
Doesn't the micron chip provide it with onfi info ?

Matthieu

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Matthieu CASTET @ 2011-08-23 10:13 UTC (permalink / raw)
  To: LiuShuo
  Cc: Li Yang-R58472, Artem Bityutskiy, linuxppc-dev@ozlabs.org,
	linux-mtd@lists.infradead.org, Scott Wood, Ivan Djelic,
	dwmw2@infradead.org
In-Reply-To: <4E53798D.7050307@freescale.com>

LiuShuo a écrit :
> 于 2011年08月23日 16:14, Matthieu CASTET 写道:
>> LiuShuo a écrit :
>>> 于 2011年08月23日 00:19, Scott Wood 写道:
>>>> On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
>>>>> Scott Wood a écrit :
>>>>>> To eliminate it we'd need to do an extra data transfer without reissuing
>>>>>> the command, which Shuo was unable to get to work.
>>>>>>
>>>>> That's weird because our controller seems quite flexible [1].
>>>>>
>>>>> Something like that should work ?
>>>>>
>>>>>               out_be32(&lbc->fir,
>>>>>                        (FIR_OP_CM2<<   FIR_OP0_SHIFT) |
>>>>>                        (FIR_OP_CA<<   FIR_OP1_SHIFT) |
>>>>>                        (FIR_OP_PA<<   FIR_OP2_SHIFT) |
>>>>>                        (FIR_OP_WB<<   FIR_OP3_SHIFT));
>>>>> refill FCM buffer with next 2k data
>>>>>
>>>>>               out_be32(&lbc->fir,
>>>>>                        (FIR_OP_WB<<   FIR_OP3_SHIFT) |
>>>>>                        (FIR_OP_CM3<<   FIR_OP4_SHIFT) |
>>>>>                        (FIR_OP_CW1<<   FIR_OP5_SHIFT) |
>>>>>                        (FIR_OP_RS<<   FIR_OP6_SHIFT));
>>>> Something like that is what I originally suggested, but Shuo said it
>>>> didn't work (even in theory, it requires a CE-don't-care NAND chip,
>>>> since bus atomicity is broken).
>>>>
>>>> Shuo, what specifically did you try, and what did you see happen?
>>>>
>>>> -Scott
>>> First, if we want to read 4K data with once command issuing, we can't
>>> use HW_ECC.
>> Yes, but as ivan said doesn't the cost of 2 read isn't bigger than software ecc ?
>>
>>> Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st
>>> 2k and 2nd 2k data.
>> Did you understand where those 0xff comes (what's the size of them. Doesn't the
>> controller try to insert spare aera ?)
> I don't understand. I set FBCR to 2048, the controller will read the 
> main area without spare area.
> But the size of them is nearly spare area size( more or less a few bytes)..
> I can't guess the behavior of the controller then, so I select another way.
> 
> Could you try to do it and explain how those 0xff comes ?
>> Could you detail the sequence you used ?
>>
> First half :
>                    out_be32(&lbc->fbcr, 2048);
shouldn't you read 2k+64 here ? At the end you want 4k plus spare aera (128) ?

>                    out_be32(&lbc->fir,
>                             (FIR_OP_CM0 << FIR_OP0_SHIFT) |
>                             (FIR_OP_CA << FIR_OP1_SHIFT) |
>                             (FIR_OP_PA << FIR_OP2_SHIFT) |
>                             (FIR_OP_CM1 << FIR_OP3_SHIFT) |
>                             (FIR_OP_RBW << FIR_OP4_SHIFT));
> 
> 
> Sencond half :
>                  out_be32(&lbc->fbcr, 2048);
>                  out_be32(&lbc->fir,
>                             (FIR_OP_RB << FIR_OP0_SHIFT) |
>                             (FIR_OP_RBW << FIR_OP1_SHIFT));
Why do you do FIR_OP_RBW ?
FIR_OP_RB already fetch the data.

Matthieu

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Joerg Roedel @ 2011-08-23 11:04 UTC (permalink / raw)
  To: aafabbri
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, chrisw, iommu, Avi Kivity,
	Anthony Liguori, linuxppc-dev, benve@cisco.com
In-Reply-To: <CA7847D2.FB3A%aafabbri@cisco.com>

On Mon, Aug 22, 2011 at 08:52:18PM -0400, aafabbri wrote:
> You have to enforce group/iommu domain assignment whether you have the
> existing uiommu API, or if you change it to your proposed
> ioctl(inherit_iommu) API.
> 
> The only change needed to VFIO here should be to make uiommu fd assignment
> happen on the groups instead of on device fds.  That operation fails or
> succeeds according to the group semantics (all-or-none assignment/same
> uiommu).

That is makes uiommu basically the same as the meta-groups, right?

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Joerg Roedel @ 2011-08-23 11:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: chrisw, Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, aafabbri, iommu,
	Avi Kivity, Anthony Liguori, linuxppc-dev, benve@cisco.com
In-Reply-To: <1314082483.30478.43.camel@pasglop>

On Tue, Aug 23, 2011 at 02:54:43AM -0400, Benjamin Herrenschmidt wrote:
> Possibly, the question that interest me the most is what interface will
> KVM end up using. I'm also not terribly fan with the (perceived)
> discrepancy between using uiommu to create groups but using the group fd
> to actually do the mappings, at least if that is still the plan.
> 
> If the separate uiommu interface is kept, then anything that wants to be
> able to benefit from the ability to put multiple devices (or existing
> groups) into such a "meta group" would need to be explicitly modified to
> deal with the uiommu APIs.
> 
> I tend to prefer such "meta groups" as being something you create
> statically using a configuration interface, either via sysfs, netlink or
> ioctl's to a "control" vfio device driven by a simple command line tool
> (which can have the configuration stored in /etc and re-apply it at
> boot).

Hmm, I don't think that these groups are static for the systems
run-time. They only exist for the lifetime of a guest per default, at
least on x86. Thats why I prefer to do this grouping using VFIO and not
some sysfs interface (which would be the third interface beside the
ioctls and netlink a VFIO user needs to be aware of). Doing this in the
ioctl interface just makes things easier.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply

* [PATCH] perf: fix build for PowerPC with uclibc toolchains
From: Florian Fainelli @ 2011-08-23 12:20 UTC (permalink / raw)
  To: Ian Munsie, linuxppc-dev

libio.h is not provided by uClibc, in order to be able to test the
definition of __UCLIBC__ we need to include stdlib.h, which also
includes stddef.h, providing the definition of 'NULL'.

Signed-off-by: Florian Fainelli <florian@openwrt.org>
---
FYI, I submitted the exact same patch for ARM:
https://patchwork.kernel.org/patch/1049152/

diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c b/tools/perf/arch/powerpc/util/dwarf-regs.c
index 48ae0c5..7cdd61d 100644
--- a/tools/perf/arch/powerpc/util/dwarf-regs.c
+++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
@@ -9,7 +9,10 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include <stdlib.h>
+#ifndef __UCLIBC__
 #include <libio.h>
+#endif
 #include <dwarf-regs.h>
 
 
-- 
1.7.4.1

^ permalink raw reply related

* [PATCH] powerpc: fixup QE_General4 errata
From: Joakim Tjernlund @ 2011-08-23 12:30 UTC (permalink / raw)
  To: Timur Tabi, linuxppc-dev

QE_General4 should only round up the divisor iff divisor is > 3.
Rounding up lower divisors makes the error too big, causing USB
on MPC832x to fail.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 arch/powerpc/sysdev/qe_lib/qe.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/sysdev/qe_lib/qe.c b/arch/powerpc/sysdev/qe_lib/qe.c
index 093e0ae..5399316 100644
--- a/arch/powerpc/sysdev/qe_lib/qe.c
+++ b/arch/powerpc/sysdev/qe_lib/qe.c
@@ -216,7 +216,7 @@ int qe_setbrg(enum qe_clock brg, unsigned int rate, unsigned int multiplier)
 	/* Errata QE_General4, which affects some MPC832x and MPC836x SOCs, says
 	   that the BRG divisor must be even if you're not using divide-by-16
 	   mode. */
-	if (!div16 && (divisor & 1))
+	if (!div16 && (divisor & 1) && (divisor > 3))
 		divisor++;
 
 	tempval = ((divisor - 1) << QE_BRGC_DIVISOR_SHIFT) |
-- 
1.7.3.4

^ permalink raw reply related

* [PATCH] usb: Allocate pram dynamically.
From: Joakim Tjernlund @ 2011-08-23 12:38 UTC (permalink / raw)
  To: Anton Vorontsov, linuxppc-dev

MPC832x does not have enough MURAM to do fixed MURAM allocation.
Change to dynamic allocation.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 drivers/usb/host/fhci-hcd.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/host/fhci-hcd.c b/drivers/usb/host/fhci-hcd.c
index c7c8392..98adbe8 100644
--- a/drivers/usb/host/fhci-hcd.c
+++ b/drivers/usb/host/fhci-hcd.c
@@ -622,12 +622,15 @@ static int __devinit of_fhci_probe(struct of_device *ofdev,
 		goto err_pram;
 	}
 
-	pram_addr = cpm_muram_alloc_fixed(iprop[2], FHCI_PRAM_SIZE);
+	pram_addr = cpm_muram_alloc(FHCI_PRAM_SIZE, 64);
 	if (IS_ERR_VALUE(pram_addr)) {
 		dev_err(dev, "failed to allocate usb pram\n");
 		ret = -ENOMEM;
 		goto err_pram;
 	}
+
+	qe_issue_cmd(QE_ASSIGN_PAGE_TO_DEVICE, QE_CR_SUBBLOCK_USB,
+		     QE_CR_PROTOCOL_UNSPECIFIED, pram_addr);
 	fhci->pram = cpm_muram_addr(pram_addr);
 
 	/* GPIOs and pins */
-- 
1.7.3.4

^ permalink raw reply related

* RE: [PATCH] RapidIO: Fix use of non-compatible registers
From: Bounine, Alexandre @ 2011-08-23 12:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kim, Chul, linux-kernel, Thomas Moll, linuxppc-dev, stable
In-Reply-To: <20110822122807.5b558d65.akpm@linux-foundation.org>

Andrew Morton <akpm@linux-foundation.org> wrote:

> On Tue, 26 Jul 2011 14:07:26 -0400
> Alexandre Bounine <alexandre.bounine@idt.com> wrote:
>=20
> > Replace/remove use of RIO v.1.2 registers/bits that are not forward-
> compatible
> > with newer versions of RapidIO specification.
> >
> > RapidIO specification v. 1.3 removed Write Port CSR, Doorbell CSR,
> > Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.
> >
.....
> You did a cc:stable but provided no reason (that I can understand) for
> backporting the patch.  Please explain why the problem is sufficiently
> serious to warrant this action.

My reason for this is that use of removed (since RIO v.1.3) register
bits
affects users of currently available 1.3 and 2.x compliant devices who
may
use not so recent kernel versions.

Removing checks for unsupported bits makes corresponding routines
compatible
with all versions of RapidIO specification. Therefore, backporting makes
stable
kernel versions compliant with RIO v.1.3 and later as well.=20

^ permalink raw reply

* Re: [PATCH] usb: Allocate pram dynamically.
From: Anton Vorontsov @ 2011-08-23 13:02 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: linuxppc-dev
In-Reply-To: <1314103121-9857-1-git-send-email-Joakim.Tjernlund@transmode.se>

On Tue, Aug 23, 2011 at 02:38:41PM +0200, Joakim Tjernlund wrote:
> MPC832x does not have enough MURAM to do fixed MURAM allocation.
> Change to dynamic allocation.
> 
> Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>

Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>

Thanks!

p.s. You probably want to send this to Greg KH, + Cc linux-usb
mailing list.

-- 
Anton Vorontsov
Email: cbouatmailru@gmail.com

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Roedel, Joerg @ 2011-08-23 13:18 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, iommu, chrisw,
	Alex Williamson, Avi Kivity, Anthony Liguori, linuxppc-dev,
	benve@cisco.com
In-Reply-To: <1314047033.7662.39.camel@pasglop>

On Mon, Aug 22, 2011 at 05:03:53PM -0400, Benjamin Herrenschmidt wrote:
> 
> > I am in favour of /dev/vfio/$GROUP. If multiple devices should be
> > assigned to a guest, there can also be an ioctl to bind a group to an
> > address-space of another group (certainly needs some care to not allow
> > that both groups belong to different processes).
> > 
> > Btw, a problem we havn't talked about yet entirely is
> > driver-deassignment. User space can decide to de-assign the device from
> > vfio while a fd is open on it. With PCI there is no way to let this fail
> > (the .release function returns void last time i checked). Is this a
> > problem, and yes, how we handle that?
> 
> We can treat it as a hard unplug (like a cardbus gone away).
> 
> IE. Dispose of the direct mappings (switch to MMIO emulation) and return
> all ff's from reads (& ignore writes).
> 
> Then send an unplug event via whatever mechanism the platform provides
> (ACPI hotplug controller on x86 for example, we haven't quite sorted out
> what to do on power for hotplug yet).

Hmm, good idea. But as far as I know the hotplug-event needs to be in
the guest _before_ the device is actually unplugged (so that the guest
can unbind its driver first). That somehow brings back the sleep-idea
and the timeout in the .release function.

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Roedel, Joerg @ 2011-08-23 13:14 UTC (permalink / raw)
  To: Alex Williamson
  Cc: Alexey Kardashevskiy, kvm@vger.kernel.org, Paul Mackerras,
	qemu-devel, chrisw, iommu, Avi Kivity, Anthony Liguori,
	linux-pci@vger.kernel.org, linuxppc-dev, benve@cisco.com
In-Reply-To: <1314040622.6866.268.camel@x201.home>

On Mon, Aug 22, 2011 at 03:17:00PM -0400, Alex Williamson wrote:
> On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote:

> > I am in favour of /dev/vfio/$GROUP. If multiple devices should be
> > assigned to a guest, there can also be an ioctl to bind a group to an
> > address-space of another group (certainly needs some care to not allow
> > that both groups belong to different processes).
> 
> That's an interesting idea.  Maybe an interface similar to the current
> uiommu interface, where you open() the 2nd group fd and pass the fd via
> ioctl to the primary group.  IOMMUs that don't support this would fail
> the attach device callback, which would fail the ioctl to bind them.  It
> will need to be designed so any group can be removed from the super-set
> and the remaining group(s) still works.  This feels like something that
> can be added after we get an initial implementation.

Handling it through fds is a good idea. This makes sure that everything
belongs to one process. I am not really sure yet if we go the way to
just bind plain groups together or if we create meta-groups. The
meta-groups thing seems somewhat cleaner, though.

> > Btw, a problem we havn't talked about yet entirely is
> > driver-deassignment. User space can decide to de-assign the device from
> > vfio while a fd is open on it. With PCI there is no way to let this fail
> > (the .release function returns void last time i checked). Is this a
> > problem, and yes, how we handle that?
> 
> The current vfio has the same problem, we can't unbind a device from
> vfio while it's attached to a guest.  I think we'd use the same solution
> too; send out a netlink packet for a device removal and have the .remove
> call sleep on a wait_event(, refcnt == 0).  We could also set a timeout
> and SIGBUS the PIDs holding the device if they don't return it
> willingly.  Thanks,

Putting the process to sleep (which would be uninterruptible) seems bad.
The process would sleep until the guest releases the device-group, which
can take days or months.
The best thing (and the most intrusive :-) ) is to change PCI core to
allow unbindings to fail, I think. But this probably further complicates
the way to upstream VFIO...

	Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

^ permalink raw reply

* Re: [PATCH] usb: Allocate pram dynamically.
From: Joakim Tjernlund @ 2011-08-23 13:23 UTC (permalink / raw)
  To: Anton Vorontsov, Greg Kroah-Hartman; +Cc: linux-usb, linuxppc-dev
In-Reply-To: <20110823130253.GA30733@oksana.dev.rtsoft.ru>

Anton Vorontsov <cbouatmailru@gmail.com> wrote on 2011/08/23 15:02:53:

> From: Anton Vorontsov <cbouatmailru@gmail.com>
> To: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
> Cc: linuxppc-dev@lists.ozlabs.org
> Date: 2011/08/23 15:02
> Subject: Re: [PATCH] usb: Allocate pram dynamically.
>
> On Tue, Aug 23, 2011 at 02:38:41PM +0200, Joakim Tjernlund wrote:
> > MPC832x does not have enough MURAM to do fixed MURAM allocation.
> > Change to dynamic allocation.
> >
> > Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
>
> Acked-by: Anton Vorontsov <cbouatmailru@gmail.com>
>
> Thanks!
>
> p.s. You probably want to send this to Greg KH, + Cc linux-usb
> mailing list.

Added linux-usb and Greg KH per Antons suggestion.

 Jocke

>From 587137e365ac1ba7e333a09962b3e4b68c587808 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Date: Tue, 23 Aug 2011 11:04:24 +0200
Subject: [PATCH] usb: Allocate pram dynamically.

MPC832x does not have enough MURAM to do fixed MURAM allocation.
Change to dynamic allocation.

Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
---
 drivers/usb/host/fhci-hcd.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/host/fhci-hcd.c b/drivers/usb/host/fhci-hcd.c
index c7c8392..98adbe8 100644
--- a/drivers/usb/host/fhci-hcd.c
+++ b/drivers/usb/host/fhci-hcd.c
@@ -622,12 +622,15 @@ static int __devinit of_fhci_probe(struct of_device *ofdev,
 		goto err_pram;
 	}

-	pram_addr = cpm_muram_alloc_fixed(iprop[2], FHCI_PRAM_SIZE);
+	pram_addr = cpm_muram_alloc(FHCI_PRAM_SIZE, 64);
 	if (IS_ERR_VALUE(pram_addr)) {
 		dev_err(dev, "failed to allocate usb pram\n");
 		ret = -ENOMEM;
 		goto err_pram;
 	}
+
+	qe_issue_cmd(QE_ASSIGN_PAGE_TO_DEVICE, QE_CR_SUBBLOCK_USB,
+		     QE_CR_PROTOCOL_UNSPECIFIED, pram_addr);
 	fhci->pram = cpm_muram_addr(pram_addr);

 	/* GPIOs and pins */
--
1.7.3.4

^ permalink raw reply related

* MPC5200 + BestComm support in QEMU
From: steve.belanger @ 2011-08-23 14:09 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2633 bytes --]

Hi, 

I'm Steve, an embedded software developper for Bombardier Transportation 
Canada. We use the MPC5200 for most of our onboard computers inside train 
control systems. To enhance our SW engineering process, we would like the 
emulate the MPC5200 processor using QEMU, an open source software CPU 
emulator. This software supports the MPC5200 CPU emulation.

However, the network interface is handled with the BestComm DMA engine and 
it seems very difficult to simulate this co-processor with our current 
knowledge level. In that sense, I would like to know if someone was able 
to emulate correctly the MPC5200 with the BestComm DMA using QEMU 
software?

Regards,

Steve Bélanger, ing. / Eng.
Développeur logiciel embarqué / Embedded Software Developper
Bombardier Transportation Canada Inc. 
Train Control and Management System
Saint-Bruno: 450-441-2020 ext.6148

Please consider the environment before you print / Merci de penser à 
l'environnement avant d'imprimer 

_______________________________________________________________________________________________________________ 

This e-mail communication (and any attachment/s) may contain confidential 
or privileged information and is intended only for the individual(s) or 
entity named above and to others who have been specifically authorized to 
receive it. If you are not the intended recipient, please do not read, 
copy, use or disclose the contents of this communication to others. Please 
notify the sender that you have received this e-mail in error by reply 
e-mail, and delete the e-mail subsequently. Please note that in order to 
protect the security of our information systems an AntiSPAM solution is in 
use and will browse through incoming emails. 
Thank you. 
_________________________________________________________________________________________________________________ 

Ce message (ainsi que le(s) fichier(s)), transmis par courriel, peut 
contenir des renseignements confidentiels ou protégés et est destiné à 
l?usage exclusif du destinataire ci-dessus. Toute autre personne est, par 
les présentes, avisée qu?il est strictement interdit de le diffuser, le 
distribuer ou le reproduire. Si vous l?avez reçu par inadvertance, 
veuillez nous en aviser et détruire ce message. Veuillez prendre note 
qu'une solution antipollupostage (AntiSPAM) est utilisée afin d'assurer la 
sécurité de nos systèmes d'information et qu'elle furètera les courriels 
entrants.
Merci. 
_________________________________________________________________________________________________________________ 

[-- Attachment #2: Type: text/html, Size: 3034 bytes --]

^ permalink raw reply

* MPC5200 + BestComm support in QEMU
From: steve.belanger @ 2011-08-23 14:26 UTC (permalink / raw)
  To: linuxppc-dev

[-- Attachment #1: Type: text/plain, Size: 2621 bytes --]

Hi, 

I'm Steve, an embedded software developper for Bombardier Transportation 
Canada. We use the MPC5200 for most of our onboard computers inside train 
control systems. To enhance our SW engineering process, we would like the 
emulate the MPC5200 processor using QEMU, an open source software CPU 
emulator. This software supports the MPC5200 CPU emulation.

However, the network interface is handled with the BestComm DMA engine and 
it seems very difficult to simulate this co-processor with our current 
knowledge level. In that sense, I would like to know if someone was able 
to emulate correctly the MPC5200 with the BestComm DMA using QEMU 
software?

Steve Bélanger, ing. / Eng.
Développeur logiciel embarqué / Embedded Software Developper
Bombardier Transportation Canada Inc. 
Train Control and Management System
Saint-Bruno: 450-441-2020 ext.6148

Please consider the environment before you print / Merci de penser à 
l'environnement avant d'imprimer 

_______________________________________________________________________________________________________________ 

This e-mail communication (and any attachment/s) may contain confidential 
or privileged information and is intended only for the individual(s) or 
entity named above and to others who have been specifically authorized to 
receive it. If you are not the intended recipient, please do not read, 
copy, use or disclose the contents of this communication to others. Please 
notify the sender that you have received this e-mail in error by reply 
e-mail, and delete the e-mail subsequently. Please note that in order to 
protect the security of our information systems an AntiSPAM solution is in 
use and will browse through incoming emails. 
Thank you. 
_________________________________________________________________________________________________________________ 

Ce message (ainsi que le(s) fichier(s)), transmis par courriel, peut 
contenir des renseignements confidentiels ou protégés et est destiné à 
l?usage exclusif du destinataire ci-dessus. Toute autre personne est, par 
les présentes, avisée qu?il est strictement interdit de le diffuser, le 
distribuer ou le reproduire. Si vous l?avez reçu par inadvertance, 
veuillez nous en aviser et détruire ce message. Veuillez prendre note 
qu'une solution antipollupostage (AntiSPAM) est utilisée afin d'assurer la 
sécurité de nos systèmes d'information et qu'elle furètera les courriels 
entrants.
Merci. 
_________________________________________________________________________________________________________________ 

[-- Attachment #2: Type: text/html, Size: 2976 bytes --]

^ permalink raw reply

* Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip
From: Scott Wood @ 2011-08-23 16:12 UTC (permalink / raw)
  To: Matthieu CASTET
  Cc: linuxppc-dev@ozlabs.org, LiuShuo, dwmw2@infradead.org,
	linux-mtd@lists.infradead.org, Li Yang-R58472
In-Reply-To: <4E537AC4.6000301@parrot.com>

On 08/23/2011 05:02 AM, Matthieu CASTET wrote:
> LiuShuo a =C3=A9crit :
>> We can't read the NOP from the ID on any chip. Some chips don't
>> give this infomation.(e.g. Micron MT29F4G08BAC)

Are there any 4K+ chips (especially ones with insufficient NOP) that
don't have the info?

This chip is 2K and NOP8.

Is there an easy way (without needing to have every datasheet for every
chip ever made) to determine at runtime which chips supply this informati=
on?

> Doesn't the micron chip provide it with onfi info ?

This chip doesn't appear to be ONFI.

-Scott

^ permalink raw reply

* Re: kvm PCI assignment & VFIO ramblings
From: Alex Williamson @ 2011-08-23 16:23 UTC (permalink / raw)
  To: David Gibson
  Cc: chrisw, Alexey Kardashevskiy, kvm, Paul Mackerras,
	linux-pci@vger.kernel.org, qemu-devel, aafabbri, iommu,
	Avi Kivity, Anthony Liguori, linuxppc-dev, benve
In-Reply-To: <20110823023822.GO30097@yookeroo.fritz.box>

On Tue, 2011-08-23 at 12:38 +1000, David Gibson wrote:
> On Mon, Aug 22, 2011 at 09:45:48AM -0600, Alex Williamson wrote:
> > On Mon, 2011-08-22 at 15:55 +1000, David Gibson wrote:
> > > On Sat, Aug 20, 2011 at 09:51:39AM -0700, Alex Williamson wrote:
> > > > We had an extremely productive VFIO BoF on Monday.  Here's my attempt to
> > > > capture the plan that I think we agreed to:
> > > > 
> > > > We need to address both the description and enforcement of device
> > > > groups.  Groups are formed any time the iommu does not have resolution
> > > > between a set of devices.  On x86, this typically happens when a
> > > > PCI-to-PCI bridge exists between the set of devices and the iommu.  For
> > > > Power, partitionable endpoints define a group.  Grouping information
> > > > needs to be exposed for both userspace and kernel internal usage.  This
> > > > will be a sysfs attribute setup by the iommu drivers.  Perhaps:
> > > > 
> > > > # cat /sys/devices/pci0000:00/0000:00:19.0/iommu_group
> > > > 42
> > > > 
> > > > (I use a PCI example here, but attribute should not be PCI specific)
> > > 
> > > Ok.  Am I correct in thinking these group IDs are representing the
> > > minimum granularity, and are therefore always static, defined only by
> > > the connected hardware, not by configuration?
> > 
> > Yes, that's the idea.  An open question I have towards the configuration
> > side is whether we might add iommu driver specific options to the
> > groups.  For instance on x86 where we typically have B:D.F granularity,
> > should we have an option not to trust multi-function devices and use a
> > B:D granularity for grouping?
> 
> Right.  And likewise I can see a place for configuration parameters
> like the present 'allow_unsafe_irqs'.  But these would be more-or-less
> global options which affected the overall granularity, rather than
> detailed configuration such as explicitly binding some devices into a
> group, yes?

Yes, currently the interrupt remapping support is a global iommu
capability.  I suppose it's possible that this could be an iommu option,
where the iommu driver would not advertise a group if the interrupt
remapping constraint isn't met.

> > > > >From there we have a few options.  In the BoF we discussed a model where
> > > > binding a device to vfio creates a /dev/vfio$GROUP character device
> > > > file.  This "group" fd provides provides dma mapping ioctls as well as
> > > > ioctls to enumerate and return a "device" fd for each attached member of
> > > > the group (similar to KVM_CREATE_VCPU).  We enforce grouping by
> > > > returning an error on open() of the group fd if there are members of the
> > > > group not bound to the vfio driver.  Each device fd would then support a
> > > > similar set of ioctls and mapping (mmio/pio/config) interface as current
> > > > vfio, except for the obvious domain and dma ioctls superseded by the
> > > > group fd.
> > > 
> > > It seems a slightly strange distinction that the group device appears
> > > when any device in the group is bound to vfio, but only becomes usable
> > > when all devices are bound.
> > > 
> > > > Another valid model might be that /dev/vfio/$GROUP is created for all
> > > > groups when the vfio module is loaded.  The group fd would allow open()
> > > > and some set of iommu querying and device enumeration ioctls, but would
> > > > error on dma mapping and retrieving device fds until all of the group
> > > > devices are bound to the vfio driver.
> > > 
> > > Which is why I marginally prefer this model, although it's not a big
> > > deal.
> > 
> > Right, we can also combine models.  Binding a device to vfio
> > creates /dev/vfio$GROUP, which only allows a subset of ioctls and no
> > device access until all the group devices are also bound.  I think
> > the /dev/vfio/$GROUP might help provide an enumeration interface as well
> > though, which could be useful.
> 
> I'm not entirely sure what you mean here.  But, that's now several
> weak votes in favour of the always-present group devices, and none in
> favour of the created-when-first-device-bound model, so I suggest we
> take the /dev/vfio/$GROUP as our tentative approach.

Yep

> > > > In either case, the uiommu interface is removed entirely since dma
> > > > mapping is done via the group fd.  As necessary in the future, we can
> > > > define a more high performance dma mapping interface for streaming dma
> > > > via the group fd.  I expect we'll also include architecture specific
> > > > group ioctls to describe features and capabilities of the iommu.  The
> > > > group fd will need to prevent concurrent open()s to maintain a 1:1 group
> > > > to userspace process ownership model.
> > > 
> > > A 1:1 group<->process correspondance seems wrong to me. But there are
> > > many ways you could legitimately write the userspace side of the code,
> > > many of them involving some sort of concurrency.  Implementing that
> > > concurrency as multiple processes (using explicit shared memory and/or
> > > other IPC mechanisms to co-ordinate) seems a valid choice that we
> > > shouldn't arbitrarily prohibit.
> > > 
> > > Obviously, only one UID may be permitted to have the group open at a
> > > time, and I think that's enough to prevent them doing any worse than
> > > shooting themselves in the foot.
> > 
> > 1:1 group<->process is probably too strong.  Not allowing concurrent
> > open()s on the group file enforces a single userspace entity is
> > responsible for that group.  Device fds can be passed to other
> > processes, but only retrieved via the group fd.  I suppose we could even
> > branch off the dma interface into a different fd, but it seems like we
> > would logically want to serialize dma mappings at each iommu group
> > anyway.  I'm open to alternatives, this just seemed an easy way to do
> > it.  Restricting on UID implies that we require isolated qemu instances
> > to run as different UIDs.
> 
> Well.. yes and know.  It means guests which need to be isolated from
> malicious interference with each other need different UIDs, but given
> that if they have the same UID one qemu can kill() or ptrace() the
> other, they're not isolated in that sense anyway.
> 
> It seems to me that running as the same UIDs with different device
> groups assigned, the guests are still pretty well isolated from
> accidental interference with each other.

If our only restriction is UID, what prevents a non-clueful user from
trying to create separate qemu instances making use of different devices
within the same group (or even the same device)?  If we restrict
concurrent opens, it's just the subsequent instances get a -EBUSY.

> >  I know that's a goal, but I don't know if we
> > want to make it an assumption in the group security model.
> > 
> > > > Also on the table is supporting non-PCI devices with vfio.  To do this,
> > > > we need to generalize the read/write/mmap and irq eventfd interfaces.
> > > > We could keep the same model of segmenting the device fd address space,
> > > > perhaps adding ioctls to define the segment offset bit position or we
> > > > could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
> > > > VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
> > > > suffering some degree of fd bloat (group fd, device fd(s), interrupt
> > > > event fd(s), per resource fd, etc).  For interrupts we can overload
> > > > VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq 
> > > 
> > > Sounds reasonable.
> > > 
> > > > (do non-PCI
> > > > devices support MSI?).
> > > 
> > > They can.  Obviously they might not have exactly the same semantics as
> > > PCI MSIs, but I know we have SoC systems with (non-PCI) on-die devices
> > > whose interrupts are treated by the (also on-die) root interrupt
> > > controller in the same way as PCI MSIs.
> > 
> > Ok, I suppose we can define ioctls to enable these as we go.  We also
> > need to figure out how non-PCI resources, interrupts, and iommu mapping
> > restrictions are described via vfio.
> 
> Yeah.  On device tree platforms we'd want it to be bound to the device
> tree representation in some way.
> 
> For platform devices, at least, could we have the index into the array
> of resources take the place of BAR number for PCI?

That's what I was thinking, but we need some way to describe the set of
valid indexes and type and size for each as well.  We already have the
BAR_LEN helper ioctl, we could make that generic (RANGE_LEN?) and add
NUM_RANGES and RANGE_TYPE.  For PCI there would always be 7 ranges (6
BARs + ROM).

> > > > For qemu, these changes imply we'd only support a model where we have a
> > > > 1:1 group to iommu domain.  The current vfio driver could probably
> > > > become vfio-pci as we might end up with more target specific vfio
> > > > drivers for non-pci.  PCI should be able to maintain a simple -device
> > > > vfio-pci,host=bb:dd.f to enable hotplug of individual devices.  We'll
> > > > need to come up with extra options when we need to expose groups to
> > > > guest for pvdma.
> > > 
> > > Are you saying that you'd no longer support the current x86 usage of
> > > putting all of one guest's devices into a single domain?
> > 
> > Yes.  I'm not sure there's a good ROI to prioritize that model.  We have
> > to assume >1 device per guest is a typical model and that the iotlb is
> > large enough that we might improve thrashing to see both a resource and
> > performance benefit from it.  I'm open to suggestions for how we could
> > include it though.
> 
> Creating supergroups of some sort seems to be what we need, but I'm
> not sure what's the best interface for doing that.

Yeah.  Joerg's idea of binding groups internally (pass the fd of one
group to another via ioctl) is one option.  The tricky part will be
implementing it to support hot unplug of any group from the supergroup.
I believe Ben had a suggestion that supergroups could be created in
sysfs, but I don't know what the mechanism to do that looks like.  It
would also be an extra management step to dynamically bind and unbind
groups to the supergroup around hotplug.  Thanks,

Alex

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox