LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [PATCH v2 3/6] powerpc/pci: use pci_is_pcie() to simplify code
From: Yijing Wang @ 2013-10-11  6:33 UTC (permalink / raw)
  To: Gavin Shan
  Cc: linux-pci, linux-kernel, James E.J. Bottomley, Paul Mackerras,
	Hanjun Guo, Bjorn Helgaas, linuxppc-dev
In-Reply-To: <20131011061654.GA561@shangw.(null)>

On 2013/10/11 14:16, Gavin Shan wrote:
> On Fri, Oct 11, 2013 at 04:49:56PM +1100, Benjamin Herrenschmidt wrote:
>> On Fri, 2013-09-06 at 14:30 -0600, Bjorn Helgaas wrote:
>>> On Thu, Sep 05, 2013 at 03:55:27PM +0800, Yijing Wang wrote:
>>>> Use pci_is_pcie() to simplify code.
>>>>
>>>> Acked-by: Kumar Gala <galak@kernel.crashing.org>
>>>> Reviewed-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>>>> Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>>>> Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
>>>> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>>>> Cc: Paul Mackerras <paulus@samba.org>
>>>> Cc: linuxppc-dev@lists.ozlabs.org
>>>> Cc: linux-kernel@vger.kernel.org
>>>> ---
>>>>  arch/powerpc/kernel/eeh.c     |    3 +--
>>>>  arch/powerpc/sysdev/fsl_pci.c |    2 +-
>>>>  2 files changed, 2 insertions(+), 3 deletions(-)
>>>
>>> Ben, Paul, this has no dependencies on anything new to PCI or any
>>> other patches in this series, so you can take it through the POWERPC
>>> tree.  If you don't want to do that, let me know and I can take it.
>>>
>>> If you want it:
>>>
>>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>>
>> It's also quite broken :-)
>>
>> See below:
>>
>>>> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>>>> index 55593ee..6ebbe54 100644
>>>> --- a/arch/powerpc/kernel/eeh.c
>>>> +++ b/arch/powerpc/kernel/eeh.c
>>>> @@ -189,8 +189,7 @@ static size_t eeh_gather_pci_data(struct eeh_dev *edev, char * buf, size_t len)
>>>>  	}
>>>>  
>>>>  	/* If PCI-E capable, dump PCI-E cap 10, and the AER */
>>>> -	cap = pci_find_capability(dev, PCI_CAP_ID_EXP);
>>>> -	if (cap) {
>>>> +	if (pci_is_pcie(dev)) {
>>>>  		n += scnprintf(buf+n, len-n, "pci-e cap10:\n");
>>>>  		printk(KERN_WARNING
>>>>  		       "EEH: PCI-E capabilities and status follow:\n");
>>
>> So we remove reading of "cap", but slightly further down the code does:
>>
>> 		for (i=0; i<=8; i++) {
>> 			eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
>> 			n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
>> 			printk(KERN_WARNING "EEH: PCI-E %02x: %08x\n", i, cfg);
>> 		}
>>
>> Which actually *uses* the value of "cap" ... oops :-)
>>
> 
> It's my fault and I should have looked into the changes more closely.
> How about changing it like this:
> 
> 	cap = pci_is_pcie(dev) ? pci_pcie_cap(dev) :
> 	      pci_find_capability(dev, PCI_CAP_ID_EXP);
> 	if (cap) {
> 		...
> 	}
> 
> It would save some PCI-CFG access cycles for most cases :-)

Hi Gavin,  it's not your fault, it's my fault. :)

Because pci_pcie_cap(dev) == dev->pcie_cap == pci_find_capability(dev, PCI_CAP_ID_EXP);

so I think it's ok to use dev->pcie_cap instead of stale "cap".

I have updated this patch.

Thanks for your review and comments!

Thanks!
Yijing.


> 
>>>> diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
>>>> index 46ac1dd..5402a1d 100644
>>>> --- a/arch/powerpc/sysdev/fsl_pci.c
>>>> +++ b/arch/powerpc/sysdev/fsl_pci.c
>>>> @@ -41,7 +41,7 @@ static void quirk_fsl_pcie_header(struct pci_dev *dev)
>>>>  	u8 hdr_type;
>>>>  
>>>>  	/* if we aren't a PCIe don't bother */
>>>> -	if (!pci_find_capability(dev, PCI_CAP_ID_EXP))
>>>> +	if (!pci_is_pcie(dev))
>>>>  		return;
>>>>  
>>>>  	/* if we aren't in host mode don't bother */
> 
> Thanks,
> Gavin
> 
> 
> .
> 


-- 
Thanks!
Yijing

^ permalink raw reply

* Re: [PATCH v2 3/6] powerpc/pci: use pci_is_pcie() to simplify code
From: Gavin Shan @ 2013-10-11  6:16 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Gavin Shan, linux-pci, linux-kernel, James E.J. Bottomley,
	Yijing Wang, Paul Mackerras, Hanjun Guo, Bjorn Helgaas,
	linuxppc-dev
In-Reply-To: <1381470596.5630.61.camel@pasglop>

On Fri, Oct 11, 2013 at 04:49:56PM +1100, Benjamin Herrenschmidt wrote:
>On Fri, 2013-09-06 at 14:30 -0600, Bjorn Helgaas wrote:
>> On Thu, Sep 05, 2013 at 03:55:27PM +0800, Yijing Wang wrote:
>> > Use pci_is_pcie() to simplify code.
>> > 
>> > Acked-by: Kumar Gala <galak@kernel.crashing.org>
>> > Reviewed-by: Gavin Shan <shangw@linux.vnet.ibm.com>
>> > Signed-off-by: Yijing Wang <wangyijing@huawei.com>
>> > Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
>> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
>> > Cc: Paul Mackerras <paulus@samba.org>
>> > Cc: linuxppc-dev@lists.ozlabs.org
>> > Cc: linux-kernel@vger.kernel.org
>> > ---
>> >  arch/powerpc/kernel/eeh.c     |    3 +--
>> >  arch/powerpc/sysdev/fsl_pci.c |    2 +-
>> >  2 files changed, 2 insertions(+), 3 deletions(-)
>> 
>> Ben, Paul, this has no dependencies on anything new to PCI or any
>> other patches in this series, so you can take it through the POWERPC
>> tree.  If you don't want to do that, let me know and I can take it.
>> 
>> If you want it:
>> 
>> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
>
>It's also quite broken :-)
>
>See below:
>
>> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
>> > index 55593ee..6ebbe54 100644
>> > --- a/arch/powerpc/kernel/eeh.c
>> > +++ b/arch/powerpc/kernel/eeh.c
>> > @@ -189,8 +189,7 @@ static size_t eeh_gather_pci_data(struct eeh_dev *edev, char * buf, size_t len)
>> >  	}
>> >  
>> >  	/* If PCI-E capable, dump PCI-E cap 10, and the AER */
>> > -	cap = pci_find_capability(dev, PCI_CAP_ID_EXP);
>> > -	if (cap) {
>> > +	if (pci_is_pcie(dev)) {
>> >  		n += scnprintf(buf+n, len-n, "pci-e cap10:\n");
>> >  		printk(KERN_WARNING
>> >  		       "EEH: PCI-E capabilities and status follow:\n");
>
>So we remove reading of "cap", but slightly further down the code does:
>
>		for (i=0; i<=8; i++) {
>			eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
>			n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
>			printk(KERN_WARNING "EEH: PCI-E %02x: %08x\n", i, cfg);
>		}
>
>Which actually *uses* the value of "cap" ... oops :-)
>

It's my fault and I should have looked into the changes more closely.
How about changing it like this:

	cap = pci_is_pcie(dev) ? pci_pcie_cap(dev) :
	      pci_find_capability(dev, PCI_CAP_ID_EXP);
	if (cap) {
		...
	}

It would save some PCI-CFG access cycles for most cases :-)

>> > diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
>> > index 46ac1dd..5402a1d 100644
>> > --- a/arch/powerpc/sysdev/fsl_pci.c
>> > +++ b/arch/powerpc/sysdev/fsl_pci.c
>> > @@ -41,7 +41,7 @@ static void quirk_fsl_pcie_header(struct pci_dev *dev)
>> >  	u8 hdr_type;
>> >  
>> >  	/* if we aren't a PCIe don't bother */
>> > -	if (!pci_find_capability(dev, PCI_CAP_ID_EXP))
>> > +	if (!pci_is_pcie(dev))
>> >  		return;
>> >  
>> >  	/* if we aren't in host mode don't bother */

Thanks,
Gavin

^ permalink raw reply

* Re: [PATCH] powerpc: Add VMX optimised xor for RAID5
From: Benjamin Herrenschmidt @ 2013-10-11  5:53 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: linuxppc-dev, paulus
In-Reply-To: <20130926133004.1bff1bd4@kryten>

On Thu, 2013-09-26 at 13:30 +1000, Anton Blanchard wrote:
> Add a VMX optimised xor, used primarily for RAID5. On a POWER7 blade
> this is a decent win:
> 
>    32regs    : 17932.800 MB/sec
>    altivec   : 19724.800 MB/sec
> 
> The bigger gain is when the same test is run in SMT4 mode, as it
> would if the machine was busy:
> 
>    8regs     :  8377.600 MB/sec
>    altivec   : 15801.600 MB/sec
> 
> I tested this against an array created without the patch, and also
> verified it worked as expected on a little endian kernel.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---

Patch got busticated on the way to the list ...

> Index: le-kernel/arch/powerpc/include/asm/xor.h
> ===================================================================
> --- le-kernel.orig/arch/powerpc/include/asm/xor.h
> +++ le-kernel/arch/powerpc/include/asm/xor.h
> @@ -1 +1,68 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
> 02111-1307, USA.
> + *
> + * Copyright (C) IBM Corporation, 2012
> + *
> + * Author: Anton Blanchard <anton@au.ibm.com>
> + */
> +#ifndef _ASM_POWERPC_XOR_H
> +#define _ASM_POWERPC_XOR_H
> +
> +#ifdef CONFIG_ALTIVEC
> +
> +#include <asm/cputable.h>
> +
> +void xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
> +		   unsigned long *v2_in);
> +void xor_altivec_3(unsigned long bytes, unsigned long *v1_in,
> +		   unsigned long *v2_in, unsigned long *v3_in);
> +void xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
> +		   unsigned long *v2_in, unsigned long *v3_in,
> +		   unsigned long *v4_in);
> +void xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
> +		   unsigned long *v2_in, unsigned long *v3_in,
> +		   unsigned long *v4_in, unsigned long *v5_in);
> +
> +static struct xor_block_template xor_block_altivec = {
> +	.name = "altivec",
> +	.do_2 = xor_altivec_2,
> +	.do_3 = xor_altivec_3,
> +	.do_4 = xor_altivec_4,
> +	.do_5 = xor_altivec_5,
> +};
> +
> +#define XOR_SPEED_ALTIVEC()				\
> +	do {						\
> +		if (cpu_has_feature(CPU_FTR_ALTIVEC))	\
> +			xor_speed(&xor_block_altivec);	\
> +	} while (0)
> +#else
> +#define XOR_SPEED_ALTIVEC
> +#endif
> +
> +/* Also try the generic routines. */
>  #include <asm-generic/xor.h>
> +
> +#undef XOR_TRY_TEMPLATES
> +#define XOR_TRY_TEMPLATES				\
> +do {							\
> +	xor_speed(&xor_block_8regs);			\
> +	xor_speed(&xor_block_8regs_p);			\
> +	xor_speed(&xor_block_32regs);			\
> +	xor_speed(&xor_block_32regs_p);			\
> +	XOR_SPEED_ALTIVEC();				\
> +} while (0)
> +
> +#endif /* _ASM_POWERPC_XOR_H */
> Index: le-kernel/arch/powerpc/lib/Makefile
> ===================================================================
> --- le-kernel.orig/arch/powerpc/lib/Makefile
> +++ le-kernel/arch/powerpc/lib/Makefile
> @@ -39,3 +39,6 @@ obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o
>  obj-y			+= code-patching.o
>  obj-y			+= feature-fixups.o
>  obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
> +
> +obj-$(CONFIG_ALTIVEC)	+= xor_vmx.o
> +CFLAGS_xor_vmx.o += -maltivec -mabi=altivec
> Index: le-kernel/arch/powerpc/lib/xor_vmx.c
> ===================================================================
> --- /dev/null
> +++ le-kernel/arch/powerpc/lib/xor_vmx.c
> @@ -0,0 +1,177 @@
> +/*
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + *
> + * You should have received a copy of the GNU General Public License
> + * along with this program; if not, write to the Free Software
> + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA
> 02111-1307, USA.
> + *
> + * Copyright (C) IBM Corporation, 2012
> + *
> + * Author: Anton Blanchard <anton@au.ibm.com>
> + */
> +#include <altivec.h>
> +
> +#include <linux/preempt.h>
> +#include <linux/export.h>
> +#include <linux/sched.h>
> +#include <asm/switch_to.h>
> +
> +typedef vector signed char unative_t;
> +
> +#define DEFINE(V)				\
> +	unative_t *V = (unative_t *)V##_in;	\
> +	unative_t V##_0, V##_1, V##_2, V##_3
> +
> +#define LOAD(V)			\
> +	do {			\
> +		V##_0 = V[0];	\
> +		V##_1 = V[1];	\
> +		V##_2 = V[2];	\
> +		V##_3 = V[3];	\
> +	} while (0)
> +
> +#define STORE(V)		\
> +	do {			\
> +		V[0] = V##_0;	\
> +		V[1] = V##_1;	\
> +		V[2] = V##_2;	\
> +		V[3] = V##_3;	\
> +	} while (0)
> +
> +#define XOR(V1, V2)					\
> +	do {						\
> +		V1##_0 = vec_xor(V1##_0, V2##_0);	\
> +		V1##_1 = vec_xor(V1##_1, V2##_1);	\
> +		V1##_2 = vec_xor(V1##_2, V2##_2);	\
> +		V1##_3 = vec_xor(V1##_3, V2##_3);	\
> +	} while (0)
> +
> +void xor_altivec_2(unsigned long bytes, unsigned long *v1_in,
> +		   unsigned long *v2_in)
> +{
> +	DEFINE(v1);
> +	DEFINE(v2);
> +	unsigned long lines = bytes / (sizeof(unative_t)) / 4;
> +
> +	preempt_disable();
> +	enable_kernel_altivec();
> +
> +	do {
> +		LOAD(v1);
> +		LOAD(v2);
> +		XOR(v1, v2);
> +		STORE(v1);
> +
> +		v1 += 4;
> +		v2 += 4;
> +	} while (--lines > 0);
> +
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(xor_altivec_2);
> +
> +void xor_altivec_3(unsigned long bytes, unsigned long *v1_in,
> +		   unsigned long *v2_in, unsigned long *v3_in)
> +{
> +	DEFINE(v1);
> +	DEFINE(v2);
> +	DEFINE(v3);
> +	unsigned long lines = bytes / (sizeof(unative_t)) / 4;
> +
> +	preempt_disable();
> +	enable_kernel_altivec();
> +
> +	do {
> +		LOAD(v1);
> +		LOAD(v2);
> +		LOAD(v3);
> +		XOR(v1, v2);
> +		XOR(v1, v3);
> +		STORE(v1);
> +
> +		v1 += 4;
> +		v2 += 4;
> +		v3 += 4;
> +	} while (--lines > 0);
> +
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(xor_altivec_3);
> +
> +void xor_altivec_4(unsigned long bytes, unsigned long *v1_in,
> +		   unsigned long *v2_in, unsigned long *v3_in,
> +		   unsigned long *v4_in)
> +{
> +	DEFINE(v1);
> +	DEFINE(v2);
> +	DEFINE(v3);
> +	DEFINE(v4);
> +	unsigned long lines = bytes / (sizeof(unative_t)) / 4;
> +
> +	preempt_disable();
> +	enable_kernel_altivec();
> +
> +	do {
> +		LOAD(v1);
> +		LOAD(v2);
> +		LOAD(v3);
> +		LOAD(v4);
> +		XOR(v1, v2);
> +		XOR(v3, v4);
> +		XOR(v1, v3);
> +		STORE(v1);
> +
> +		v1 += 4;
> +		v2 += 4;
> +		v3 += 4;
> +		v4 += 4;
> +	} while (--lines > 0);
> +
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(xor_altivec_4);
> +
> +void xor_altivec_5(unsigned long bytes, unsigned long *v1_in,
> +		   unsigned long *v2_in, unsigned long *v3_in,
> +		   unsigned long *v4_in, unsigned long *v5_in)
> +{
> +	DEFINE(v1);
> +	DEFINE(v2);
> +	DEFINE(v3);
> +	DEFINE(v4);
> +	DEFINE(v5);
> +	unsigned long lines = bytes / (sizeof(unative_t)) / 4;
> +
> +	preempt_disable();
> +	enable_kernel_altivec();
> +
> +	do {
> +		LOAD(v1);
> +		LOAD(v2);
> +		LOAD(v3);
> +		LOAD(v4);
> +		LOAD(v5);
> +		XOR(v1, v2);
> +		XOR(v3, v4);
> +		XOR(v1, v5);
> +		XOR(v1, v3);
> +		STORE(v1);
> +
> +		v1 += 4;
> +		v2 += 4;
> +		v3 += 4;
> +		v4 += 4;
> +		v5 += 4;
> +	} while (--lines > 0);
> +
> +	preempt_enable();
> +}
> +EXPORT_SYMBOL(xor_altivec_5);

^ permalink raw reply

* Re: [PATCH v2 3/6] powerpc/pci: use pci_is_pcie() to simplify code
From: Benjamin Herrenschmidt @ 2013-10-11  5:49 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: Gavin Shan, linux-pci, linux-kernel, James E.J. Bottomley,
	Paul Mackerras, Hanjun Guo, Yijing Wang, linuxppc-dev
In-Reply-To: <20130906203035.GA27940@google.com>

On Fri, 2013-09-06 at 14:30 -0600, Bjorn Helgaas wrote:
> On Thu, Sep 05, 2013 at 03:55:27PM +0800, Yijing Wang wrote:
> > Use pci_is_pcie() to simplify code.
> > 
> > Acked-by: Kumar Gala <galak@kernel.crashing.org>
> > Reviewed-by: Gavin Shan <shangw@linux.vnet.ibm.com>
> > Signed-off-by: Yijing Wang <wangyijing@huawei.com>
> > Cc: Gavin Shan <shangw@linux.vnet.ibm.com>
> > Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
> > Cc: Paul Mackerras <paulus@samba.org>
> > Cc: linuxppc-dev@lists.ozlabs.org
> > Cc: linux-kernel@vger.kernel.org
> > ---
> >  arch/powerpc/kernel/eeh.c     |    3 +--
> >  arch/powerpc/sysdev/fsl_pci.c |    2 +-
> >  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> Ben, Paul, this has no dependencies on anything new to PCI or any
> other patches in this series, so you can take it through the POWERPC
> tree.  If you don't want to do that, let me know and I can take it.
> 
> If you want it:
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>

It's also quite broken :-)

See below:

> > diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> > index 55593ee..6ebbe54 100644
> > --- a/arch/powerpc/kernel/eeh.c
> > +++ b/arch/powerpc/kernel/eeh.c
> > @@ -189,8 +189,7 @@ static size_t eeh_gather_pci_data(struct eeh_dev *edev, char * buf, size_t len)
> >  	}
> >  
> >  	/* If PCI-E capable, dump PCI-E cap 10, and the AER */
> > -	cap = pci_find_capability(dev, PCI_CAP_ID_EXP);
> > -	if (cap) {
> > +	if (pci_is_pcie(dev)) {
> >  		n += scnprintf(buf+n, len-n, "pci-e cap10:\n");
> >  		printk(KERN_WARNING
> >  		       "EEH: PCI-E capabilities and status follow:\n");

So we remove reading of "cap", but slightly further down the code does:

		for (i=0; i<=8; i++) {
			eeh_ops->read_config(dn, cap+4*i, 4, &cfg);
			n += scnprintf(buf+n, len-n, "%02x:%x\n", 4*i, cfg);
			printk(KERN_WARNING "EEH: PCI-E %02x: %08x\n", i, cfg);
		}

Which actually *uses* the value of "cap" ... oops :-)

> > diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
> > index 46ac1dd..5402a1d 100644
> > --- a/arch/powerpc/sysdev/fsl_pci.c
> > +++ b/arch/powerpc/sysdev/fsl_pci.c
> > @@ -41,7 +41,7 @@ static void quirk_fsl_pcie_header(struct pci_dev *dev)
> >  	u8 hdr_type;
> >  
> >  	/* if we aren't a PCIe don't bother */
> > -	if (!pci_find_capability(dev, PCI_CAP_ID_EXP))
> > +	if (!pci_is_pcie(dev))
> >  		return;
> >  
> >  	/* if we aren't in host mode don't bother */
> > -- 
> > 1.7.1
> > 
> > 

^ permalink raw reply

* Re: [PATCH] powerpc, perf: Configure BHRB filter before enabling PMU interrupts
From: Anshuman Khandual @ 2013-10-11  4:32 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev, mikey
In-Reply-To: <20131011021131.GA26561@concordia>

On 10/11/2013 07:41 AM, Michael Ellerman wrote:
> On Thu, Oct 10, 2013 at 02:20:22PM +0530, Anshuman Khandual wrote:
>> On 10/09/2013 11:33 AM, Michael Ellerman wrote:
>>> On Wed, Oct 09, 2013 at 10:16:32AM +0530, Anshuman Khandual wrote:
>>>> On 10/09/2013 06:51 AM, Michael Ellerman wrote:
>>>>> On Tue, Oct 08, 2013 at 12:51:18PM +0530, Anshuman Khandual wrote:
>>>>>> On 10/08/2013 09:51 AM, Michael Ellerman wrote:
>>>>>>> On Mon, Oct 07, 2013 at 10:00:26AM +0530, Anshuman Khandual wrote:
>>>>>>>> Right now the `config_bhrb` PMU specific call happens after write_mmcr0
>>>>>>>> which actually enables the PMU for event counting and interrupt. So
>>>>>>>> there is a small window of time where the PMU and BHRB runs without the
>>>>>>>> required HW branch filter (if any) enabled in BHRB. This can cause some
>>>>>>>> of the branch samples to be collected through BHRB without any filter
>>>>>>>> being applied and hence affecting the correctness of the results. This
>>>>>>>> patch moves the BHRB config function call before enabling the interrupts.
>>>>>>>
>>>>>>> Patch looks good.
>>>>>>>
>>>>>>> But it reminds me I have an item in my TODO list:
>>>>>>>  - "Why can't config_bhrb() be done in compute_mmcr()" ?
>>>>>>>
>>>>>>
>>>>>> compute_mmcr() function deals with generic MMCR* configs for normal PMU
>>>>>> events. Even if BHRB config touches MMCRA register, it's configuration
>>>>>> does not interfere with the PMU config for general events. So its best
>>>>>> to keep them separate. 
>>>>>
>>>>> I'm unconvinced. If they'd been together to begin with this bug never
>>>>> would have happened.
>>>>
>>>> This is an ordering of configuration problem. Putting them together in the
>>>> same function does not rule out the chances of this ordering problem. Could
>>>> you please kindly explain how this could have been avoided ?
>>>
>>> The existing code already makes sure to write MMCRA before MMCR0.
>>
>> Thats not true. One example being here at power_pmu_enable function.
>>
>>         write_mmcr0(cpuhw, mmcr0);
>>
>>         /*
>>          * Enable instruction sampling if necessary
>>          */
>>         if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE) {
>>                 mb();
>>                 mtspr(SPRN_MMCRA, cpuhw->mmcr[2]);
>>         }
> 
> The only example.
> 
> The BHRB config would have been applied prior to that:
> 
> 	mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
> 	mtspr(SPRN_MMCR1, cpuhw->mmcr[1]);
> 	mtspr(SPRN_MMCR0, (cpuhw->mmcr[0] & ~(MMCR0_PMC1CE | MMCR0_PMCjCE))
> 				| MMCR0_FC);
> 
> 
> So as I said, if the BHRB config was in cpuhw->mmcr[2] then the ordering would
> have been correct.

I got your point. But ...

> 
>> Even I think this is not right. Instruction sampling should have been
>> enabled before we enable PMU interrupts. Else there is a small window
>> of time where we could have the PMU enabled with events (which requires
>> sampling) without the sampling itself being enabled in MMCRA.
> 
> Yes I agree. That's a separate bug, which we'll need to test on all the book3s
> platforms we have perf support for.

Okay, I guess any platform which supports sampling will definitely want to have
it enabled before we can set the events to count on PMU. Can you think of any
problem which can arise if we move it before the enabling the PMU back ? Else
we can fix this easily.

^ permalink raw reply

* Re: [PATCH v2 RESEND 0/3] cpufreq/ondemand support for Xserve G5 & iMac G5 iSight
From: Benjamin Herrenschmidt @ 2013-10-11  4:32 UTC (permalink / raw)
  To: Aaro Koskinen; +Cc: Rafael J. Wysocki, Viresh Kumar, linuxppc-dev, linux-pm
In-Reply-To: <1380573873-14448-1-git-send-email-aaro.koskinen@iki.fi>

On Mon, 2013-09-30 at 23:44 +0300, Aaro Koskinen wrote:
> Hi,
> 
> This is a resend of the v2 patchset:
> 
> 	http://marc.info/?t=137797013200001&r=1&w=2
> 
> No changes except rebasing. Any chance to get these into v3.13?

BTW. Ack from me, Rafael feel free to merge these.

Cheers,
Ben.

^ permalink raw reply

* RE: [PATCH v5] powerpc/mpc85xx: Update the clock nodes in device tree
From: Tang Yuantian-B29983 @ 2013-10-11  3:18 UTC (permalink / raw)
  To: Mark Rutland
  Cc: devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Li Yang-Leo-R58472
In-Reply-To: <20131010100331.GE26954@e106331-lin.cambridge.arm.com>

VGhhbmtzIGZvciB5b3VyIHJldmlldy4NClNlZSBteSByZXBseSBpbmxpbmUNCg0KPiAtLS0tLU9y
aWdpbmFsIE1lc3NhZ2UtLS0tLQ0KPiBGcm9tOiBNYXJrIFJ1dGxhbmQgW21haWx0bzptYXJrLnJ1
dGxhbmRAYXJtLmNvbV0NCj4gU2VudDogMjAxM8TqMTDUwjEwyNUg0MfG2svEIDE4OjA0DQo+IFRv
OiBUYW5nIFl1YW50aWFuLUIyOTk4Mw0KPiBDYzogZ2FsYWtAa2VybmVsLmNyYXNoaW5nLm9yZzsg
bGludXhwcGMtZGV2QGxpc3RzLm96bGFicy5vcmc7DQo+IGRldmljZXRyZWVAdmdlci5rZXJuZWwu
b3JnOyBMaSBZYW5nLUxlby1SNTg0NzINCj4gU3ViamVjdDogUmU6IFtQQVRDSCB2NV0gcG93ZXJw
Yy9tcGM4NXh4OiBVcGRhdGUgdGhlIGNsb2NrIG5vZGVzIGluIGRldmljZQ0KPiB0cmVlDQo+IA0K
PiBPbiBXZWQsIE9jdCAwOSwgMjAxMyBhdCAwNzozODoyNEFNICswMTAwLCBZdWFudGlhbi5UYW5n
QGZyZWVzY2FsZS5jb20NCj4gd3JvdGU6DQo+ID4gRnJvbTogVGFuZyBZdWFudGlhbiA8eXVhbnRp
YW4udGFuZ0BmcmVlc2NhbGUuY29tPg0KPiA+DQo+ID4gVGhlIGZvbGxvd2luZyBTb0NzIHdpbGwg
YmUgYWZmZWN0ZWQ6IHAyMDQxLCBwMzA0MSwgcDQwODAsIHA1MDIwLA0KPiA+IHA1MDQwLCBiNDQy
MCwgYjQ4NjAsIHQ0MjQwDQo+ID4NCj4gPiBTaWduZWQtb2ZmLWJ5OiBUYW5nIFl1YW50aWFuIDxZ
dWFudGlhbi5UYW5nQGZyZWVzY2FsZS5jb20+DQo+ID4gU2lnbmVkLW9mZi1ieTogTGkgWWFuZyA8
bGVvbGlAZnJlZXNjYWxlLmNvbT4NCj4gPiAtLS0NCj4gPiB2NToNCj4gPiAgICAgICAgIC0gcmVm
aW5lIHRoZSBiaW5kaW5nIGRvY3VtZW50DQo+ID4gICAgICAgICAtIHVwZGF0ZSB0aGUgY29tcGF0
aWJsZSBzdHJpbmcNCj4gPiB2NDoNCj4gPiAgICAgICAgIC0gYWRkIGJpbmRpbmcgZG9jdW1lbnQN
Cj4gPiAgICAgICAgIC0gdXBkYXRlIGNvbXBhdGlibGUgc3RyaW5nDQo+ID4gICAgICAgICAtIHVw
ZGF0ZSB0aGUgcmVnIHByb3BlcnR5DQo+ID4gdjM6DQo+ID4gICAgICAgICAtIGZpeCB0eXBvDQo+
ID4gdjI6DQo+ID4gICAgICAgICAtIGFkZCB0NDI0MCwgYjQ0MjAsIGI0ODYwIHN1cHBvcnQNCj4g
PiAgICAgICAgIC0gcmVtb3ZlIHBsbC80IGNsb2NrIGZyb20gcDIwNDEsIHAzMDQxIGFuZCBwNTAy
MCBib2FyZA0KPiA+DQo+ID4gIC4uLi9kZXZpY2V0cmVlL2JpbmRpbmdzL2Nsb2NrL2NvcmVuZXQt
Y2xvY2sudHh0ICAgIHwgMTExDQo+ICsrKysrKysrKysrKysrKysrKysrDQo+ID4gIGFyY2gvcG93
ZXJwYy9ib290L2R0cy9mc2wvYjQ0MjBzaS1wb3N0LmR0c2kgICAgICAgIHwgIDM1ICsrKysrKysN
Cj4gPiAgYXJjaC9wb3dlcnBjL2Jvb3QvZHRzL2ZzbC9iNDQyMHNpLXByZS5kdHNpICAgICAgICAg
fCAgIDIgKw0KPiA+ICBhcmNoL3Bvd2VycGMvYm9vdC9kdHMvZnNsL2I0ODYwc2ktcG9zdC5kdHNp
ICAgICAgICB8ICAzNSArKysrKysrDQo+ID4gIGFyY2gvcG93ZXJwYy9ib290L2R0cy9mc2wvYjQ4
NjBzaS1wcmUuZHRzaSAgICAgICAgIHwgICA0ICsNCj4gPiAgYXJjaC9wb3dlcnBjL2Jvb3QvZHRz
L2ZzbC9wMjA0MXNpLXBvc3QuZHRzaSAgICAgICAgfCAgNjAgKysrKysrKysrKysNCj4gPiAgYXJj
aC9wb3dlcnBjL2Jvb3QvZHRzL2ZzbC9wMjA0MXNpLXByZS5kdHNpICAgICAgICAgfCAgIDQgKw0K
PiA+ICBhcmNoL3Bvd2VycGMvYm9vdC9kdHMvZnNsL3AzMDQxc2ktcG9zdC5kdHNpICAgICAgICB8
ICA2MCArKysrKysrKysrKw0KPiA+ICBhcmNoL3Bvd2VycGMvYm9vdC9kdHMvZnNsL3AzMDQxc2kt
cHJlLmR0c2kgICAgICAgICB8ICAgNCArDQo+ID4gIGFyY2gvcG93ZXJwYy9ib290L2R0cy9mc2wv
cDQwODBzaS1wb3N0LmR0c2kgICAgICAgIHwgMTEyDQo+ICsrKysrKysrKysrKysrKysrKysrKw0K
PiA+ICBhcmNoL3Bvd2VycGMvYm9vdC9kdHMvZnNsL3A0MDgwc2ktcHJlLmR0c2kgICAgICAgICB8
ICAgOCArKw0KPiA+ICBhcmNoL3Bvd2VycGMvYm9vdC9kdHMvZnNsL3A1MDIwc2ktcG9zdC5kdHNp
ICAgICAgICB8ICA0MiArKysrKysrKw0KPiA+ICBhcmNoL3Bvd2VycGMvYm9vdC9kdHMvZnNsL3A1
MDIwc2ktcHJlLmR0c2kgICAgICAgICB8ICAgMiArDQo+ID4gIGFyY2gvcG93ZXJwYy9ib290L2R0
cy9mc2wvcDUwNDBzaS1wb3N0LmR0c2kgICAgICAgIHwgIDYwICsrKysrKysrKysrDQo+ID4gIGFy
Y2gvcG93ZXJwYy9ib290L2R0cy9mc2wvcDUwNDBzaS1wcmUuZHRzaSAgICAgICAgIHwgICA0ICsN
Cj4gPiAgYXJjaC9wb3dlcnBjL2Jvb3QvZHRzL2ZzbC90NDI0MHNpLXBvc3QuZHRzaSAgICAgICAg
fCAgODUNCj4gKysrKysrKysrKysrKysrKw0KPiA+ICBhcmNoL3Bvd2VycGMvYm9vdC9kdHMvZnNs
L3Q0MjQwc2ktcHJlLmR0c2kgICAgICAgICB8ICAxMiArKysNCj4gPiAgMTcgZmlsZXMgY2hhbmdl
ZCwgNjQwIGluc2VydGlvbnMoKykNCj4gPiAgY3JlYXRlIG1vZGUgMTAwNjQ0DQo+ID4gRG9jdW1l
bnRhdGlvbi9kZXZpY2V0cmVlL2JpbmRpbmdzL2Nsb2NrL2NvcmVuZXQtY2xvY2sudHh0DQo+ID4N
Cj4gPiBkaWZmIC0tZ2l0IGEvRG9jdW1lbnRhdGlvbi9kZXZpY2V0cmVlL2JpbmRpbmdzL2Nsb2Nr
L2NvcmVuZXQtY2xvY2sudHh0DQo+ID4gYi9Eb2N1bWVudGF0aW9uL2RldmljZXRyZWUvYmluZGlu
Z3MvY2xvY2svY29yZW5ldC1jbG9jay50eHQNCj4gPiBuZXcgZmlsZSBtb2RlIDEwMDY0NA0KPiA+
IGluZGV4IDAwMDAwMDAuLjhlZmM2MmQNCj4gPiAtLS0gL2Rldi9udWxsDQo+ID4gKysrIGIvRG9j
dW1lbnRhdGlvbi9kZXZpY2V0cmVlL2JpbmRpbmdzL2Nsb2NrL2NvcmVuZXQtY2xvY2sudHh0DQo+
ID4gQEAgLTAsMCArMSwxMTEgQEANCj4gPiArKiBDbG9jayBCbG9jayBvbiBGcmVlc2NhbGUgQ29y
ZU5ldCBQbGF0Zm9ybXMNCj4gPiArDQo+ID4gK0ZyZWVzY2FsZSBDb3JlTmV0IGNoaXBzIHRha2Ug
cHJpbWFyeSBjbG9ja2luZyBpbnB1dCBmcm9tIHRoZSBleHRlcm5hbA0KPiA+ICtTWVNDTEsgc2ln
bmFsLiBUaGUgU1lTQ0xLIGlucHV0IChmcmVxdWVuY3kpIGlzIG11bHRpcGxpZWQgdXNpbmcNCj4g
PiArbXVsdGlwbGUgcGhhc2UgbG9ja2VkIGxvb3BzIChQTEwpIHRvIGNyZWF0ZSBhIHZhcmlldHkg
b2YgZnJlcXVlbmNpZXMNCj4gPiArd2hpY2ggY2FuIHRoZW4gYmUgcGFzc2VkIHRvIGEgdmFyaWV0
eSBvZiBpbnRlcm5hbCBsb2dpYywgaW5jbHVkaW5nDQo+ID4gK2NvcmVzIGFuZCBwZXJpcGhlcmFs
IElQIGJsb2Nrcy4NCj4gPiArUGxlYXNlIHJlZmVyIHRvIHRoZSBSZWZlcmVuY2UgTWFudWFsIGZv
ciBkZXRhaWxzLg0KPiA+ICsNCj4gPiArMS4gQ2xvY2sgQmxvY2sgQmluZGluZw0KPiA+ICsNCj4g
PiArUmVxdWlyZWQgcHJvcGVydGllczoNCj4gPiArLSBjb21wYXRpYmxlOiBTaG91bGQgaW5jbHVk
ZSBvbmUgb3IgbW9yZSBvZiB0aGUgZm9sbG93aW5nOg0KPiA+ICsgICAgICAgLSAiZnNsLDxjaGlw
Pi1jbG9ja2dlbiI6IGZvciBjaGlwIHNwZWNpZmljIGNsb2NrIGJsb2NrDQo+ID4gKyAgICAgICAt
ICJmc2wscW9yaXEtY2xvY2tnZW4tWzEsMl0ueCI6IGZvciBjaGFzc2lzIDEueCBhbmQgMi54IGNs
b2NrDQo+IA0KPiBXaGlsZSBJIGNhbiBzZWUgdGhhdCAiZnNsLDxjaGlwPi1jbG9ja2dlbiIgbWln
aHQgaGF2ZSBhIGxhcmdlIHNldCBvZg0KPiBzdHJpbmdzIHRoYXQgd2UgbWF5IG5ldmVyIGRlYWwg
d2l0aCBpbiB0aCBrZXJuZWwsIEknZCBwcmVmZXIgdGhhdCB0aGUNCj4gYmFzaWMgc3RyaW5ncyAo
aS5lLiBhbGwgdGhlICJmc2wscW9yaXEtY2xvY2tnZW4tWzEsMl0ueCIgdmFyaWFudHMpIHdlcmUN
Cj4gbGlzdGVkIGV4cGxpY2l0bHkgaGVyZS4NCj4gDQo+IEdpdmVuIHRoZXkgb25seSBzZWVtIHRv
IGJlICJmc2wscW9yaXEtY2xvY2tnZW4tMS4wIiBhbmQgImZzbCxxb3JpcS0NCj4gY2xvY2tnZW4t
Mi4wIiB0aGlzIHNob3VsZG4ndCBiZSB0b28gZGlmZmljdWx0IHRvIGxpc3QgYW5kIGRlc2NyaWJl
Lg0KPiANCk9LLCBJIHdpbGwgbGlzdCB0aGVtIGFsbC4NCg0KPiA+ICstIHJlZzogT2Zmc2V0IGFu
ZCBsZW5ndGggb2YgdGhlIGNsb2NrIHJlZ2lzdGVyIHNldA0KPiA+ICstIGNsb2NrLWZyZXF1ZW5j
eTogSW5kaWNhdGVzIGlucHV0IGNsb2NrIGZyZXF1ZW5jeSBvZiBjbG9jayBibG9jay4NCj4gPiAr
ICAgICAgIFdpbGwgYmUgc2V0IGJ5IHUtYm9vdA0KPiANCj4gV2h5IGRvZXMgdGhlIGZhY3QgdGhp
cyBpcyBzZXQgYnkgdS1ib290IG1hdHRlciB0byB0aGUgYmluZGluZz8NCj4gDQpPSywgSSB3aWxs
IHJlbW92ZSBpdC4NCg0KPiA+ICsNCj4gPiArUmVjb21tZW5kZWQgcHJvcGVydGllczoNCj4gPiAr
LSAjZGRyZXNzLWNlbGxzOiBTcGVjaWZpZXMgdGhlIG51bWJlciBvZiBjZWxscyB1c2VkIHRvIHJl
cHJlc2VudA0KPiA+ICsgICAgICAgcGh5c2ljYWwgYmFzZSBhZGRyZXNzZXMuICBNdXN0IGJlIHBy
ZXNlbnQgaWYgdGhlIGRldmljZSBoYXMNCj4gPiArICAgICAgIHN1Yi1ub2RlcyBhbmQgc2V0IHRv
IDEgaWYgcHJlc2VudA0KPiANCj4gVHlwbzogI2FkZHJlc3MtY2VsbHMNCj4gDQo+IEluIHRoZSBl
eGFtcGxlIGl0IGxvb2tzIGxpa2UgdGhlIGFkZHJlc3MgY2VsbHMgb2YgY2hpbGQgbm9kZXMgYXJl
IG9mZnNldHMNCj4gd2l0aGluIHRoZSB1bml0LCByYXRoZXIgdGhhbiBhYnNvbHV0ZSBwaHlzaWNh
bCBhZGRyZXNzZXMuIENvdWxkIHRoZSBjb2RlDQo+IG5vdCB0cmVhdCB0aGVtIGFzIGFic29sdXRl
IGFkZHJlc3Nlcz8gVGhlbiB3ZSdkIG9ubHkgbmVlZCB0byBkb2N1bWVudA0KPiB0aGF0ICNhZGRy
ZXNzLWNlbGxzLCAjc2l6ZS1jZWxscyBhbmQgcmFuZ2VzIG11c3QgYmUgcHJlc2VudCBhbmQgaGF2
ZQ0KPiB2YWx1ZXMgc3VpdGFibGUgZm9yIG1hcHBpbmcgY2hpbGQgbm9kZXMgaW50byB0aGUgYWRk
cmVzcyBzcGFjZSBvZiB0aGUNCj4gcGFyZW50Lg0KPiANCk9LLCB0aGFua3MuDQoNCj4gPiArLSAj
c2l6ZS1jZWxsczogU3BlY2lmaWVzIHRoZSBudW1iZXIgb2YgY2VsbHMgdXNlZCB0byByZXByZXNl
bnQNCj4gPiArICAgICAgIHRoZSBzaXplIG9mIGFuIGFkZHJlc3MuIE11c3QgYmUgcHJlc2VudCBp
ZiB0aGUgZGV2aWNlIGhhcw0KPiA+ICsgICAgICAgc3ViLW5vZGVzIGFuZCBzZXQgdG8gMSBpZiBw
cmVzZW50DQo+IA0KPiBJdCdzIG5vdCByZWFsbHkgdGhlIHNpemUgb2YgYW4gYWRkcmVzcywgaXQn
cyB0aGUgc2l6ZSBvZiBhIHJlZ2lvbg0KPiBpZGVudGlmaWVkIGJ5IGEgcmVnIGVudHJ5Lg0KPiAN
Cj4gSSB0aGluayB0aGlzIGNhbiBiZSBzaW1wbGlmaWVkIGJ5IG15IHN1Z2dlc3Rpb24gYWJvdmUu
DQo+IA0KWWVzDQoNCj4gPiArDQo+ID4gKzIuIENsb2NrIFByb3ZpZGVyL0NvbnN1bWVyIEJpbmRp
bmcNCj4gPiArDQo+ID4gK01vc3Qgb2YgdGhlIGJpbmRpbmcgYXJlIGZyb20gdGhlIGNvbW1vbiBj
bG9jayBiaW5kaW5nWzFdLg0KPiA+ICsgWzFdIERvY3VtZW50YXRpb24vZGV2aWNldHJlZS9iaW5k
aW5ncy9jbG9jay9jbG9jay1iaW5kaW5ncy50eHQNCj4gPiArDQo+ID4gK1JlcXVpcmVkIHByb3Bl
cnRpZXM6DQo+ID4gKy0gY29tcGF0aWJsZSA6IFNob3VsZCBpbmNsdWRlIG9uZSBvciBtb3JlIG9m
IHRoZSBmb2xsb3dpbmc6DQo+ID4gKyAgICAgICAtICJmc2wscW9yaXEtY29yZS1wbGwtWzEsMl0u
eCI6IEluZGljYXRlcyBhIGNvcmUgUExMIGNsb2NrDQo+IGRldmljZQ0KPiA+ICsgICAgICAgLSAi
ZnNsLHFvcmlxLWNvcmUtbXV4LVsxLDJdLngiOiBJbmRpY2F0ZXMgYSBjb3JlIG11bHRpcGxleGVy
DQo+IGNsb2NrDQo+ID4gKyAgICAgICAgICAgICAgIGRldmljZTsgZGl2aWRlZCBmcm9tIHRoZSBj
b3JlIFBMTCBjbG9jaw0KPiANCj4gQXMgYWJvdmUsIEknZCBwcmVmZXIgYSBjb21wbGV0ZSBsaXN0
IG9mIHRoZSBiYXNpYyBzdHJpbmdzIHdlIGV4cGVjdC4NCj4gDQpUaGVyZSBpcyBubyBsaXN0IGhl
cmUsIGp1c3QgKi1tdXgtMS54IGFuZCAqLW11eC0yLngNCldoYXQga2luZCBvZiBsaXN0IGRvIHlv
dSBwcmVmZXI/DQoNCj4gPiArICAgICAgIC0gImZpeGVkLWNsb2NrIjogRnJvbSBjb21tb24gY2xv
Y2sgYmluZGluZzsgaW5kaWNhdGVzIG91dHB1dA0KPiBjbG9jaw0KPiA+ICsgICAgICAgICAgICAg
ICBvZiBvc2NpbGxhdG9yDQo+ID4gKyAgICAgICAtICJmc2wscW9yaXEtc3lzY2xrLVsxLDJdLngi
OiBJbmRpY2F0ZXMgaW5wdXQgc3lzdGVtIGNsb2NrDQo+IA0KPiBIZXJlIHRvby4NCj4gDQo+ID4g
Ky0gI2Nsb2NrLWNlbGxzOiBGcm9tIGNvbW1vbiBjbG9jayBiaW5kaW5nOyBpbmRpY2F0ZXMgdGhl
IG51bWJlciBvZg0KPiA+ICsgICAgICAgb3V0cHV0IGNsb2NrLiAwIGlzIGZvciBvbmUgb3V0cHV0
IGNsb2NrOyAxIGZvciBtb3JlIHRoYW4gb25lDQo+ID4gK2Nsb2NrDQo+IA0KPiBJZiBhIGNsb2Nr
IHNvdXJjZSBoYXMgbXVsdGlwbGUgb3V0cHV0cywgd2hhdCB0aG9zZSBvdXRwdXRzIGFyZSBhbmQg
d2hhdA0KPiB2YWx1ZXMgaW4gY2xvY2stY2VsbHMgdGhleSBjb3JyZXNwb25kIHRvIHNob3VsZCBi
ZSBkZXNjcmliZWQgaGVyZS4NCj4gDQpUaGVyZSBpcyBubyB3YXkgdG8gZGVzY3JpYmUgdGhlIHZh
bHVlcyBvZiBtdWx0aXBsZSBvdXRwdXRzIGhlcmUuDQpUaGlzIHByb3BlcnR5IGlzIHRoZSB0eXBl
IG9mIEJPT0wuIFNlZSB0aGUgY2xvY2stYmluZGluZ3MudHh0Lg0KDQo+ID4gKw0KPiA+ICtSZWNv
bW1lbmRlZCBwcm9wZXJ0aWVzOg0KPiA+ICstIGNsb2NrczogU2hvdWxkIGJlIHRoZSBwaGFuZGxl
IG9mIGlucHV0IHBhcmVudCBjbG9jaw0KPiA+ICstIGNsb2NrLW5hbWVzOiBGcm9tIGNvbW1vbiBj
bG9jayBiaW5kaW5nLCBpbmRpY2F0ZXMgdGhlIGNsb2NrIG5hbWUNCj4gDQo+IFRoYXQgZGVzY3Jp
cHRpb24ncyBhIGJpdCBvcGFxdWUuDQo+IA0KPiBXaGF0J3MgdGhlIG5hbWUgb2YgdGhlIGNsb2Nr
IGlucHV0IG9uIHRoZXNlIHVuaXRzPyBUaGF0J3Mgd2hhdCBjbG9jay0NCj4gbmFtZXMgc2hvdWxk
IGNvbnRhaW4sIGFuZCB0aGF0IHNob3VsZCBiZSBkb2N1bWVudGVkLg0KPiANCklzIGl0IG5lY2Vz
c2FyeSB0byBkb2N1bWVudCB0aGVzZSBuYW1lcyBzaW5jZSB0aGV5IGFyZSB0b3RhbGx5IHVzZWQN
CmJ5IGNsb2NrIHByb3ZpZGVyIGFuZCBjbG9jayBjb25zdW1lciBoYXMgbm8gaWRlYSBhYm91dCB0
aGVtPw0KDQo+IERvIHdlIG5vdCBfYWx3YXlzXyBuZWVkIHRoZSBwYXJlbnQgY2xvY2s/DQo+IA0K
Tm90IGZvciBmaXhlZC1jbG9jayB0aGF0IGl0cyBwYXJlbnQgY2xvY2sgaXMgb3NjaWxsYXRvciA6
KQ0KDQo+IElmIHdlIGhhdmUgYSBjbG9jayBkbyB3ZSBuZWVkIGEgY2xvY2stbmFtZXMgZW50cnkg
Zm9yIGl0Pw0KPiANClRlY2huaWNhbGx5IHllcywgYnV0IEkgZG9uJ3QgYm90aGVyIHRvIGFkZCBp
dCBpZiB0aGUgY2xvY2sgbm9kZSBoYXMNCm9ubHkgb25lIGNsb2Nrcy4NCg0KPiA+ICstIGNsb2Nr
LW91dHB1dC1uYW1lczogRnJvbSBjb21tb24gY2xvY2sgYmluZGluZywgaW5kaWNhdGVzIHRoZSBu
YW1lcw0KPiBvZg0KPiA+ICsgICAgICAgb3V0cHV0IGNsb2Nrcw0KPiA+ICstIHJlZzogU2hvdWxk
IGJlIHRoZSBvZmZzZXQgYW5kIGxlbmd0aCBvZiBjbG9jayBibG9jayBiYXNlIGFkZHJlc3MuDQo+
ID4gKyAgICAgICBUaGUgbGVuZ3RoIHNob3VsZCBiZSA0Lg0KPiA+ICsNCj4gPiArRXhhbXBsZSBm
b3IgY2xvY2sgYmxvY2sgYW5kIGNsb2NrIHByb3ZpZGVyOg0KPiA+ICsvIHsNCj4gPiArICAgICAg
IGNsb2NrZ2VuOiBnbG9iYWwtdXRpbGl0aWVzQGUxMDAwIHsNCj4gPiArICAgICAgICAgICAgICAg
Y29tcGF0aWJsZSA9ICJmc2wscDUwMjAtY2xvY2tnZW4iLCAiZnNsLHFvcmlxLWNsb2NrZ2VuLQ0K
PiAxLjAiOw0KPiA+ICsgICAgICAgICAgICAgICByZWcgPSA8MHhlMTAwMCAweDEwMDA+Ow0KPiA+
ICsgICAgICAgICAgICAgICBjbG9jay1mcmVxdWVuY3kgPSA8MD47DQo+IA0KPiBUaGF0IGxvb2tz
IG9kZC4NCj4gDQpZZXMsIGJ1dCBpdCBoYXMgYWxyZWFkeSBleGlzdGVkIGhlcmUgYmVmb3JlIHRo
aXMgcGF0Y2guDQpDYW4gSSBtb3ZlIGl0IHRvIHN5c2NsayBjbG9jayBub2RlIHNpbmNlIGl0IGlz
IG5vdCB1c2VkIHlldD8NCg0KPiA+ICsgICAgICAgICAgICAgICAjYWRkcmVzcy1jZWxscyA9IDwx
PjsNCj4gPiArICAgICAgICAgICAgICAgI3NpemUtY2VsbHMgPSA8MT47DQo+ID4gKw0KPiA+ICsg
ICAgICAgICAgICAgICBzeXNjbGs6IHN5c2NsayB7DQo+ID4gKyAgICAgICAgICAgICAgICAgICAg
ICAgI2Nsb2NrLWNlbGxzID0gPDA+Ow0KPiA+ICsgICAgICAgICAgICAgICAgICAgICAgIGNvbXBh
dGlibGUgPSAiZnNsLHFvcmlxLXN5c2Nsay0xLjAiLA0KPiA+ICsgImZpeGVkLWNsb2NrIjsNCj4g
DQo+IFdlIGRpZG4ndCBtZW50aW9uIGluIHRoZSBiaW5kaW5nIHRoYXQgImZzbCxxb3JpcS1zeXNj
bGstMS4wIiB3YXMNCj4gY29tcGF0aWJsZSB3aXRoICJmaXhlZC1jbG9jayIgYW5kIHNob3VsZCBo
YXZlICJmaXhlZC1jbG9jayIgaW4gdGhlDQo+IGNvbXBhdGlibGUgbGlzdC4uLg0KPiANCk9LLCB3
aWxsIGZpeCBpdC4NCg0KPiA+ICsgICAgICAgICAgICAgICAgICAgICAgIGNsb2NrLW91dHB1dC1u
YW1lcyA9ICJzeXNjbGsiOw0KPiA+ICsgICAgICAgICAgICAgICB9DQo+ID4gKw0KPiA+ICsgICAg
ICAgICAgICAgICBwbGwwOiBwbGwwQDgwMCB7DQo+ID4gKyAgICAgICAgICAgICAgICAgICAgICAg
I2Nsb2NrLWNlbGxzID0gPDE+Ow0KPiA+ICsgICAgICAgICAgICAgICAgICAgICAgIHJlZyA9IDww
eDgwMCAweDQ+Ow0KPiA+ICsgICAgICAgICAgICAgICAgICAgICAgIGNvbXBhdGlibGUgPSAiZnNs
LHFvcmlxLWNvcmUtcGxsLTEuMCI7DQo+ID4gKyAgICAgICAgICAgICAgICAgICAgICAgY2xvY2tz
ID0gPCZzeXNjbGs+Ow0KPiA+ICsgICAgICAgICAgICAgICAgICAgICAgIGNsb2NrLW91dHB1dC1u
YW1lcyA9ICJwbGwwIiwgInBsbDAtZGl2MiI7DQo+ID4gKyAgICAgICAgICAgICAgIH07DQo+ID4g
Kw0KPiA+ICsgICAgICAgICAgICAgICBwbGwxOiBwbGwxQDgyMCB7DQo+ID4gKyAgICAgICAgICAg
ICAgICAgICAgICAgI2Nsb2NrLWNlbGxzID0gPDE+Ow0KPiA+ICsgICAgICAgICAgICAgICAgICAg
ICAgIHJlZyA9IDwweDgyMCAweDQ+Ow0KPiA+ICsgICAgICAgICAgICAgICAgICAgICAgIGNvbXBh
dGlibGUgPSAiZnNsLHFvcmlxLWNvcmUtcGxsLTEuMCI7DQo+ID4gKyAgICAgICAgICAgICAgICAg
ICAgICAgY2xvY2tzID0gPCZzeXNjbGs+Ow0KPiA+ICsgICAgICAgICAgICAgICAgICAgICAgIGNs
b2NrLW91dHB1dC1uYW1lcyA9ICJwbGwxIiwgInBsbDEtZGl2MiI7DQo+ID4gKyAgICAgICAgICAg
ICAgIH07DQo+ID4gKw0KPiA+ICsgICAgICAgICAgICAgICBtdXgwOiBtdXgwQDAgew0KPiA+ICsg
ICAgICAgICAgICAgICAgICAgICAgICNjbG9jay1jZWxscyA9IDwwPjsNCj4gPiArICAgICAgICAg
ICAgICAgICAgICAgICByZWcgPSA8MHgwIDB4ND47DQo+ID4gKyAgICAgICAgICAgICAgICAgICAg
ICAgY29tcGF0aWJsZSA9ICJmc2wscW9yaXEtY29yZS1tdXgtMS4wIjsNCj4gPiArICAgICAgICAg
ICAgICAgICAgICAgICBjbG9ja3MgPSA8JnBsbDAgMD4sIDwmcGxsMCAxPiwgPCZwbGwxIDA+LA0K
PiA8JnBsbDEgMT47DQo+ID4gKyAgICAgICAgICAgICAgICAgICAgICAgY2xvY2stbmFtZXMgPSAi
cGxsMF8wIiwgInBsbDBfMSIsICJwbGwxXzAiLA0KPiAicGxsMV8xIjsNCj4gPiArICAgICAgICAg
ICAgICAgICAgICAgICBjbG9jay1vdXRwdXQtbmFtZXMgPSAiY211eDAiOw0KPiA+ICsgICAgICAg
ICAgICAgICB9Ow0KPiA+ICsNCj4gPiArICAgICAgICAgICAgICAgbXV4MTogbXV4MUAyMCB7DQo+
ID4gKyAgICAgICAgICAgICAgICAgICAgICAgI2Nsb2NrLWNlbGxzID0gPDA+Ow0KPiA+ICsgICAg
ICAgICAgICAgICAgICAgICAgIHJlZyA9IDwweDIwIDB4ND47DQo+ID4gKyAgICAgICAgICAgICAg
ICAgICAgICAgY29tcGF0aWJsZSA9ICJmc2wscW9yaXEtY29yZS1tdXgtMS4wIjsNCj4gPiArICAg
ICAgICAgICAgICAgICAgICAgICBjbG9ja3MgPSA8JnBsbDAgMD4sIDwmcGxsMCAxPiwgPCZwbGwx
IDA+LA0KPiA8JnBsbDEgMT47DQo+ID4gKyAgICAgICAgICAgICAgICAgICAgICAgY2xvY2stbmFt
ZXMgPSAicGxsMF8wIiwgInBsbDBfMSIsICJwbGwxXzAiLA0KPiAicGxsMV8xIjsNCj4gPiArICAg
ICAgICAgICAgICAgICAgICAgICBjbG9jay1vdXRwdXQtbmFtZXMgPSAiY211eDEiOw0KPiANCj4g
SG93IGRvZXMgdGhlIG11eCBjaG9vc2Ugd2hpY2ggaW5wdXQgY2xvY2sgdG8gdXNlIGF0IGEgcG9p
bnQgaW4gdGltZT8NCj4gDQpUaGF0IGlzIGRlY2lkZWQgYXQgcnVudGltZS4gRGlmZmVyZW50IGlu
cHV0IGNsb2NrIHdpbGwgbGVhZCB0byB0aGUgZGlmZmVyZW50DQpDbG9jayBmcmVxdWVuY3kgdGhh
dCB0aGUgQ1BVcyB3b3JrIG9uLg0KDQpUaGFua3MsDQpZdWFudGlhbg0KDQo+IENoZWVycywNCj4g
TWFyay4NCg0K

^ permalink raw reply

* [PATCH 3/3] powerpc/pseries: Implement arch_get_random_long() based on H_RANDOM
From: Michael Ellerman @ 2013-10-11  3:07 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1381460879-2814-1-git-send-email-michael@ellerman.id.au>

Add support for the arch_get_random_long() hook based on the H_RANDOM
hypervisor call. We trust the hypervisor to provide us with random data,
ie. we don't whiten it in anyway.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/platforms/pseries/Makefile |  2 +-
 arch/powerpc/platforms/pseries/rng.c    | 44 +++++++++++++++++++++++++++++++++
 2 files changed, 45 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/platforms/pseries/rng.c

diff --git a/arch/powerpc/platforms/pseries/Makefile b/arch/powerpc/platforms/pseries/Makefile
index 6c61ec5..fbccac9 100644
--- a/arch/powerpc/platforms/pseries/Makefile
+++ b/arch/powerpc/platforms/pseries/Makefile
@@ -3,7 +3,7 @@ ccflags-$(CONFIG_PPC_PSERIES_DEBUG)	+= -DDEBUG
 
 obj-y			:= lpar.o hvCall.o nvram.o reconfig.o \
 			   setup.o iommu.o event_sources.o ras.o \
-			   firmware.o power.o dlpar.o mobility.o
+			   firmware.o power.o dlpar.o mobility.o rng.o
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_SCANLOG)	+= scanlog.o
 obj-$(CONFIG_EEH)	+= eeh_pseries.o
diff --git a/arch/powerpc/platforms/pseries/rng.c b/arch/powerpc/platforms/pseries/rng.c
new file mode 100644
index 0000000..a702f1c
--- /dev/null
+++ b/arch/powerpc/platforms/pseries/rng.c
@@ -0,0 +1,44 @@
+/*
+ * Copyright 2013, Michael Ellerman, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt)	"pseries-rng: " fmt
+
+#include <linux/kernel.h>
+#include <linux/of.h>
+#include <asm/archrandom.h>
+#include <asm/machdep.h>
+
+
+static int pseries_get_random_long(unsigned long *v)
+{
+	unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
+
+	if (plpar_hcall(H_RANDOM, retbuf) == H_SUCCESS) {
+		*v = retbuf[0];
+		return 1;
+	}
+
+	return 0;
+}
+
+static __init int rng_init(void)
+{
+	struct device_node *dn;
+
+	dn = of_find_compatible_node(NULL, NULL, "ibm,random");
+	if (!dn)
+		return -ENODEV;
+
+	pr_info("Registering arch random hook.\n");
+
+	ppc_md.get_random_long = pseries_get_random_long;
+
+	return 0;
+}
+subsys_initcall(rng_init);
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH 2/3] hwrng: Add a driver for the hwrng found in power7+ systems
From: Michael Ellerman @ 2013-10-11  3:07 UTC (permalink / raw)
  To: linuxppc-dev
In-Reply-To: <1381460879-2814-1-git-send-email-michael@ellerman.id.au>

Add a driver for the hwrng found in power7+ systems, based on the
existing code for the arch_get_random_long() hook.

We only register a single instance of the driver, not one per device,
because we use the existing per_cpu array of devices in the arch code.
This means we always read from the "closest" device, avoiding inter-chip
memory traffic.

Signed-off-by: Guo Chao <yan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 drivers/char/hw_random/Kconfig       | 13 ++++++
 drivers/char/hw_random/Makefile      |  1 +
 drivers/char/hw_random/powernv-rng.c | 81 ++++++++++++++++++++++++++++++++++++
 3 files changed, 95 insertions(+)
 create mode 100644 drivers/char/hw_random/powernv-rng.c

diff --git a/drivers/char/hw_random/Kconfig b/drivers/char/hw_random/Kconfig
index 0aa9d91..c206de2 100644
--- a/drivers/char/hw_random/Kconfig
+++ b/drivers/char/hw_random/Kconfig
@@ -290,6 +290,19 @@ config HW_RANDOM_PSERIES
 
 	  If unsure, say Y.
 
+config HW_RANDOM_POWERNV
+	tristate "PowerNV Random Number Generator support"
+	depends on HW_RANDOM && PPC_POWERNV
+	default HW_RANDOM
+	---help---
+	  This is the driver for Random Number Generator hardware found
+	  in POWER7+ and above machines for PowerNV platform.
+
+	  To compile this driver as a module, choose M here: the
+	  module will be called powernv-rng.
+
+	  If unsure, say Y.
+
 config HW_RANDOM_EXYNOS
 	tristate "EXYNOS HW random number generator support"
 	depends on HW_RANDOM && HAS_IOMEM && HAVE_CLK
diff --git a/drivers/char/hw_random/Makefile b/drivers/char/hw_random/Makefile
index bed467c..d7d2435 100644
--- a/drivers/char/hw_random/Makefile
+++ b/drivers/char/hw_random/Makefile
@@ -24,6 +24,7 @@ obj-$(CONFIG_HW_RANDOM_NOMADIK) += nomadik-rng.o
 obj-$(CONFIG_HW_RANDOM_PICOXCELL) += picoxcell-rng.o
 obj-$(CONFIG_HW_RANDOM_PPC4XX) += ppc4xx-rng.o
 obj-$(CONFIG_HW_RANDOM_PSERIES) += pseries-rng.o
+obj-$(CONFIG_HW_RANDOM_POWERNV) += powernv-rng.o
 obj-$(CONFIG_HW_RANDOM_EXYNOS)	+= exynos-rng.o
 obj-$(CONFIG_HW_RANDOM_TPM) += tpm-rng.o
 obj-$(CONFIG_HW_RANDOM_BCM2835) += bcm2835-rng.o
diff --git a/drivers/char/hw_random/powernv-rng.c b/drivers/char/hw_random/powernv-rng.c
new file mode 100644
index 0000000..3f4f632
--- /dev/null
+++ b/drivers/char/hw_random/powernv-rng.c
@@ -0,0 +1,81 @@
+/*
+ * Copyright 2013 Michael Ellerman, Guo Chao, IBM Corp.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt)	KBUILD_MODNAME ": " fmt
+
+#include <linux/module.h>
+#include <linux/kernel.h>
+#include <linux/platform_device.h>
+#include <linux/random.h>
+#include <linux/hw_random.h>
+
+static int powernv_rng_read(struct hwrng *rng, void *data, size_t max, bool wait)
+{
+	unsigned long *buf;
+	int i, len;
+
+	/* We rely on rng_buffer_size() being >= sizeof(unsigned long) */
+	len = max / sizeof(unsigned long);
+
+	buf = (unsigned long *)data;
+
+	for (i = 0; i < len; i++)
+		powernv_get_random_long(buf++);
+
+	return len * sizeof(unsigned long);
+}
+
+static struct hwrng powernv_hwrng = {
+	.name = "powernv-rng",
+	.read = powernv_rng_read,
+};
+
+static int powernv_rng_remove(struct platform_device *pdev)
+{
+	hwrng_unregister(&powernv_hwrng);
+
+	return 0;
+}
+
+static int powernv_rng_probe(struct platform_device *pdev)
+{
+	int rc;
+
+	rc = hwrng_register(&powernv_hwrng);
+	if (rc) {
+		/* We only register one device, ignore any others */
+		if (rc == -EEXIST)
+			rc = -ENODEV;
+
+		return rc;
+	}
+
+	pr_info("Registered powernv hwrng.\n");
+
+	return 0;
+}
+
+static struct of_device_id powernv_rng_match[] = {
+	{ .compatible	= "ibm,power-rng",},
+	{},
+};
+MODULE_DEVICE_TABLE(of, powernv_rng_match);
+
+static struct platform_driver powernv_rng_driver = {
+	.driver = {
+		.name = "powernv_rng",
+		.of_match_table = powernv_rng_match,
+	},
+	.probe	= powernv_rng_probe,
+	.remove = powernv_rng_remove,
+};
+module_platform_driver(powernv_rng_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Bare metal HWRNG driver for POWER7+ and above");
-- 
1.8.1.2

^ permalink raw reply related

* [PATCH 1/3] powerpc: Implement arch_get_random_long/int() for powernv
From: Michael Ellerman @ 2013-10-11  3:07 UTC (permalink / raw)
  To: linuxppc-dev

Add the plumbing to implement arch_get_random_long/int(). It didn't seem
worth adding an extra ppc_md hook for int, so we reuse the one for long.

Add an implementation for powernv based on the hwrng found in power7+
systems. We whiten the output of the hwrng, and the result passes all
the dieharder tests.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
---
 arch/powerpc/Kconfig                    |   3 +
 arch/powerpc/include/asm/archrandom.h   |  32 +++++++++
 arch/powerpc/include/asm/machdep.h      |   4 ++
 arch/powerpc/platforms/powernv/Kconfig  |   1 +
 arch/powerpc/platforms/powernv/Makefile |   2 +-
 arch/powerpc/platforms/powernv/rng.c    | 122 ++++++++++++++++++++++++++++++++
 6 files changed, 163 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/archrandom.h
 create mode 100644 arch/powerpc/platforms/powernv/rng.c

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 38f3b7e..45f16f7 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -1009,6 +1009,9 @@ config PHYSICAL_START
 	default "0x00000000"
 endif
 
+config	ARCH_RANDOM
+	def_bool n
+
 source "net/Kconfig"
 
 source "drivers/Kconfig"
diff --git a/arch/powerpc/include/asm/archrandom.h b/arch/powerpc/include/asm/archrandom.h
new file mode 100644
index 0000000..d853d16
--- /dev/null
+++ b/arch/powerpc/include/asm/archrandom.h
@@ -0,0 +1,32 @@
+#ifndef _ASM_POWERPC_ARCHRANDOM_H
+#define _ASM_POWERPC_ARCHRANDOM_H
+
+#ifdef CONFIG_ARCH_RANDOM
+
+#include <asm/machdep.h>
+
+static inline int arch_get_random_long(unsigned long *v)
+{
+	if (ppc_md.get_random_long)
+		return ppc_md.get_random_long(v);
+
+	return 0;
+}
+
+static inline int arch_get_random_int(unsigned int *v)
+{
+	unsigned long val;
+	int rc;
+
+	rc = arch_get_random_long(&val);
+	if (rc)
+		*v = val;
+
+	return rc;
+}
+
+int powernv_get_random_long(unsigned long *v);
+
+#endif /* CONFIG_ARCH_RANDOM */
+
+#endif /* _ASM_POWERPC_ARCHRANDOM_H */
diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h
index 8b48090..ce6cc2a 100644
--- a/arch/powerpc/include/asm/machdep.h
+++ b/arch/powerpc/include/asm/machdep.h
@@ -263,6 +263,10 @@ struct machdep_calls {
 	ssize_t (*cpu_probe)(const char *, size_t);
 	ssize_t (*cpu_release)(const char *, size_t);
 #endif
+
+#ifdef CONFIG_ARCH_RANDOM
+	int (*get_random_long)(unsigned long *v);
+#endif
 };
 
 extern void e500_idle(void);
diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 6fae5eb..2108464 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -9,6 +9,7 @@ config PPC_POWERNV
 	select EPAPR_BOOT
 	select PPC_INDIRECT_PIO
 	select PPC_UDBG_16550
+	select ARCH_RANDOM
 	default y
 
 config POWERNV_MSI
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index 300c437..6760a86 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -1,5 +1,5 @@
 obj-y			+= setup.o opal-takeover.o opal-wrappers.o opal.o
-obj-y			+= opal-rtc.o opal-nvram.o opal-lpc.o
+obj-y			+= opal-rtc.o opal-nvram.o opal-lpc.o rng.o
 
 obj-$(CONFIG_SMP)	+= smp.o
 obj-$(CONFIG_PCI)	+= pci.o pci-p5ioc2.o pci-ioda.o
diff --git a/arch/powerpc/platforms/powernv/rng.c b/arch/powerpc/platforms/powernv/rng.c
new file mode 100644
index 0000000..02db7d7
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/rng.c
@@ -0,0 +1,122 @@
+/*
+ * Copyright 2013, Michael Ellerman, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt)	"powernv-rng: " fmt
+
+#include <linux/kernel.h>
+#include <linux/of.h>
+#include <linux/of_platform.h>
+#include <linux/slab.h>
+#include <asm/archrandom.h>
+#include <asm/io.h>
+#include <asm/machdep.h>
+
+
+struct powernv_rng {
+	void __iomem *regs;
+	unsigned long mask;
+};
+
+static DEFINE_PER_CPU(struct powernv_rng *, powernv_rng);
+
+
+static unsigned long rng_whiten(struct powernv_rng *rng, unsigned long val)
+{
+	unsigned long parity;
+
+	/* Calculate the parity of the value */
+	asm ("popcntd %0,%1" : "=r" (parity) : "r" (val));
+
+	/* xor our value with the previous mask */
+	val ^= rng->mask;
+
+	/* update the mask based on the parity of this value */
+	rng->mask = (rng->mask << 1) | (parity & 1);
+
+	return val;
+}
+
+int powernv_get_random_long(unsigned long *v)
+{
+	struct powernv_rng *rng;
+
+	rng = get_cpu_var(powernv_rng);
+
+	*v = rng_whiten(rng, in_be64(rng->regs));
+
+	put_cpu_var(rng);
+
+	return 1;
+}
+EXPORT_SYMBOL_GPL(powernv_get_random_long);
+
+static __init void rng_init_per_cpu(struct powernv_rng *rng,
+				    struct device_node *dn)
+{
+	int chip_id, cpu;
+
+	chip_id = of_get_ibm_chip_id(dn);
+	if (chip_id == -1)
+		pr_warn("No ibm,chip-id found for %s.\n", dn->full_name);
+
+	for_each_possible_cpu(cpu) {
+		if (per_cpu(powernv_rng, cpu) == NULL ||
+		    cpu_to_chip_id(cpu) == chip_id) {
+			per_cpu(powernv_rng, cpu) = rng;
+		}
+	}
+}
+
+static __init int rng_create(struct device_node *dn)
+{
+	struct powernv_rng *rng;
+	unsigned long val;
+
+	rng = kzalloc(sizeof(*rng), GFP_KERNEL);
+	if (!rng)
+		return -ENOMEM;
+
+	rng->regs = of_iomap(dn, 0);
+	if (!rng->regs) {
+		kfree(rng);
+		return -ENXIO;
+	}
+
+	val = in_be64(rng->regs);
+	rng->mask = val;
+
+	rng_init_per_cpu(rng, dn);
+
+	pr_info_once("Registering arch random hook.\n");
+
+	ppc_md.get_random_long = powernv_get_random_long;
+
+	return 0;
+}
+
+static __init int rng_init(void)
+{
+	struct device_node *dn;
+	int rc;
+
+	for_each_compatible_node(dn, NULL, "ibm,power-rng") {
+		rc = rng_create(dn);
+		if (rc) {
+			pr_err("Failed creating rng for %s (%d).\n",
+				dn->full_name, rc);
+			continue;
+		}
+
+		/* Create devices for hwrng driver */
+		of_platform_device_create(dn, NULL, NULL);
+	}
+
+	return 0;
+}
+subsys_initcall(rng_init);
-- 
1.8.1.2

^ permalink raw reply related

* Re: [PATCH] powerpc, perf: Configure BHRB filter before enabling PMU interrupts
From: Michael Ellerman @ 2013-10-11  2:11 UTC (permalink / raw)
  To: Anshuman Khandual; +Cc: linuxppc-dev, mikey
In-Reply-To: <52566A4E.2000409@linux.vnet.ibm.com>

On Thu, Oct 10, 2013 at 02:20:22PM +0530, Anshuman Khandual wrote:
> On 10/09/2013 11:33 AM, Michael Ellerman wrote:
> > On Wed, Oct 09, 2013 at 10:16:32AM +0530, Anshuman Khandual wrote:
> >> On 10/09/2013 06:51 AM, Michael Ellerman wrote:
> >>> On Tue, Oct 08, 2013 at 12:51:18PM +0530, Anshuman Khandual wrote:
> >>>> On 10/08/2013 09:51 AM, Michael Ellerman wrote:
> >>>>> On Mon, Oct 07, 2013 at 10:00:26AM +0530, Anshuman Khandual wrote:
> >>>>>> Right now the `config_bhrb` PMU specific call happens after write_mmcr0
> >>>>>> which actually enables the PMU for event counting and interrupt. So
> >>>>>> there is a small window of time where the PMU and BHRB runs without the
> >>>>>> required HW branch filter (if any) enabled in BHRB. This can cause some
> >>>>>> of the branch samples to be collected through BHRB without any filter
> >>>>>> being applied and hence affecting the correctness of the results. This
> >>>>>> patch moves the BHRB config function call before enabling the interrupts.
> >>>>>
> >>>>> Patch looks good.
> >>>>>
> >>>>> But it reminds me I have an item in my TODO list:
> >>>>>  - "Why can't config_bhrb() be done in compute_mmcr()" ?
> >>>>>
> >>>>
> >>>> compute_mmcr() function deals with generic MMCR* configs for normal PMU
> >>>> events. Even if BHRB config touches MMCRA register, it's configuration
> >>>> does not interfere with the PMU config for general events. So its best
> >>>> to keep them separate. 
> >>>
> >>> I'm unconvinced. If they'd been together to begin with this bug never
> >>> would have happened.
> >>
> >> This is an ordering of configuration problem. Putting them together in the
> >> same function does not rule out the chances of this ordering problem. Could
> >> you please kindly explain how this could have been avoided ?
> > 
> > The existing code already makes sure to write MMCRA before MMCR0.
> 
> Thats not true. One example being here at power_pmu_enable function.
>
>         write_mmcr0(cpuhw, mmcr0);
> 
>         /*
>          * Enable instruction sampling if necessary
>          */
>         if (cpuhw->mmcr[2] & MMCRA_SAMPLE_ENABLE) {
>                 mb();
>                 mtspr(SPRN_MMCRA, cpuhw->mmcr[2]);
>         }

The only example.

The BHRB config would have been applied prior to that:

	mtspr(SPRN_MMCRA, cpuhw->mmcr[2] & ~MMCRA_SAMPLE_ENABLE);
	mtspr(SPRN_MMCR1, cpuhw->mmcr[1]);
	mtspr(SPRN_MMCR0, (cpuhw->mmcr[0] & ~(MMCR0_PMC1CE | MMCR0_PMCjCE))
				| MMCR0_FC);


So as I said, if the BHRB config was in cpuhw->mmcr[2] then the ordering would
have been correct.

> Even I think this is not right. Instruction sampling should have been
> enabled before we enable PMU interrupts. Else there is a small window
> of time where we could have the PMU enabled with events (which requires
> sampling) without the sampling itself being enabled in MMCRA.

Yes I agree. That's a separate bug, which we'll need to test on all the book3s
platforms we have perf support for.

cheers

^ permalink raw reply

* Re: [PATCH v2 1/3] powerpc/booke64: add sync after writing PTE
From: Benjamin Herrenschmidt @ 2013-10-10 23:51 UTC (permalink / raw)
  To: Scott Wood; +Cc: linuxppc-dev
In-Reply-To: <1381447532.7979.488.camel@snotra.buserror.net>

On Thu, 2013-10-10 at 18:25 -0500, Scott Wood wrote:

> Looking at some of the code in mm/, I suspect that the normal callers of
> set_pte_at() already have an unlock (and thus a sync) 

Unlock is lwsync actually...

> already, so we may
> not even be relying on those retries.  Certainly some of them do; it
> would take some effort to verify all of them.
> 
> Also, without such a sync in map_kernel_page(), even with software
> tablewalk, couldn't we theoretically have a situation where a store to
> pointer X that exposes a new mapping gets reordered before the PTE store
> as seen by another CPU?  The other CPU could see non-NULL X and
> dereference it, but get the stale PTE.  Callers of ioremap() generally
> don't do a barrier of their own prior to exposing the result.

Hrm, we transition to the new PTE either restricts the access permission
in which case it flushes the TLB (and synchronizes with other CPUs) or
extends access (adds dirty, set pte from 0 -> populated, ...) in which
case the worst case is we see the old one and take a spurrious fault.

So the problem would only be with kernel mappings and in that case I
think we are fine. A driver doing an ioremap shouldn't then start using
that mapping on another CPU before having *informed* that other CPU of
the existence of the mapping and that should be ordered.

Ben.

^ permalink raw reply

* Re: [PATCH v2 1/3] powerpc/booke64: add sync after writing PTE
From: Scott Wood @ 2013-10-10 23:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1381444273.7979.473.camel@snotra.buserror.net>

On Thu, 2013-10-10 at 17:31 -0500, Scott Wood wrote:
> On Mon, 2013-09-16 at 19:06 -0500, Scott Wood wrote:
> > On Mon, 2013-09-16 at 07:38 +1000, Benjamin Herrenschmidt wrote:
> > > On Fri, 2013-09-13 at 22:50 -0500, Scott Wood wrote:
> > > > The ISA says that a sync is needed to order a PTE write with a
> > > > subsequent hardware tablewalk lookup.  On e6500, without this sync
> > > > we've been observed to die with a DSI due to a PTE write not being seen
> > > > by a subsequent access, even when everything happens on the same
> > > > CPU.
> > > 
> > > This is gross, I didn't realize we had that bogosity in the
> > > architecture...
> > > 
> > > Did you measure the performance impact ?
> > 
> > I didn't see a noticeable impact on the tests I ran, but those were
> > aimed at measuring TLB miss overhead.  I'll need to try it with a
> > benchmark that's more oriented around lots of page table updates.
> 
> Lmbench's fork test runs about 2% slower with the sync.  I've been told
> that nothing relevant has changed since we saw the failure during
> emulation; it's probably luck and/or timing, or maybe a sync got added
> somewhere else since then?  I think it's only really a problem for
> kernel page tables, since user page tables will retry if do_page_fault()
> sees a valid PTE.  So maybe we should put an mb() in map_kernel_page()
> instead.

Looking at some of the code in mm/, I suspect that the normal callers of
set_pte_at() already have an unlock (and thus a sync) already, so we may
not even be relying on those retries.  Certainly some of them do; it
would take some effort to verify all of them.

Also, without such a sync in map_kernel_page(), even with software
tablewalk, couldn't we theoretically have a situation where a store to
pointer X that exposes a new mapping gets reordered before the PTE store
as seen by another CPU?  The other CPU could see non-NULL X and
dereference it, but get the stale PTE.  Callers of ioremap() generally
don't do a barrier of their own prior to exposing the result.

-Scott

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Mark Lord @ 2013-10-10 23:17 UTC (permalink / raw)
  To: Alexander Gordeev, H. Peter Anvin
  Cc: linux-mips, VMware, Inc., linux-nvme, linux-ide, linux-s390,
	Andy King, linux-scsi, linux-rdma, x86, Ingo Molnar, linux-pci,
	iss_storagedev, linux-driver, Tejun Heo, Bjorn Helgaas,
	Dan Williams, Jon Mason, Solarflare linux maintainers, netdev,
	linux-kernel, Ralf Baechle, e1000-devel, Martin Schwidefsky,
	linux390, linuxppc-dev
In-Reply-To: <20131010180704.GA15719@dhcp-26-207.brq.redhat.com>

Just to help us all understand "the loop" issue..

Here's an example of driver code which uses the existing MSI-X interfaces,
for a device which can work with either 16, 8, 4, 2, or 1 MSI-X interrupt.
This is from a new driver I'm working on right now:


static int xx_alloc_msix_irqs (struct xx_dev *dev, int nvec)
{
        xx_disable_all_irqs(dev);
        do {
                if (nvec < 2)
                        xx_prep_for_1_msix_vector(dev);
                else if (nvec < 4)
                        xx_prep_for_2_msix_vectors(dev);
                else if (nvec < 8)
                        xx_prep_for_4_msix_vectors(dev);
                else if (nvec < 16)
                        xx_prep_for_8_msix_vectors(dev);
                else
                        xx_prep_for_16_msix_vectors(dev);
                nvec = pci_enable_msix(dev->pdev, dev->irqs, dev->num_vectors);
        } while (nvec > 0);

        if (nvec) {
                kerr(dev->name, "pci_enable_msix() failed, err=%d", nvec);
                dev->num_vectors = 0;
                return nvec;
        }
        return 0;       /* success */
}

^ permalink raw reply

* Re: [PATCH v2 1/3] powerpc/booke64: add sync after writing PTE
From: Scott Wood @ 2013-10-10 22:31 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev
In-Reply-To: <1379376371.2536.218.camel@snotra.buserror.net>

On Mon, 2013-09-16 at 19:06 -0500, Scott Wood wrote:
> On Mon, 2013-09-16 at 07:38 +1000, Benjamin Herrenschmidt wrote:
> > On Fri, 2013-09-13 at 22:50 -0500, Scott Wood wrote:
> > > The ISA says that a sync is needed to order a PTE write with a
> > > subsequent hardware tablewalk lookup.  On e6500, without this sync
> > > we've been observed to die with a DSI due to a PTE write not being seen
> > > by a subsequent access, even when everything happens on the same
> > > CPU.
> > 
> > This is gross, I didn't realize we had that bogosity in the
> > architecture...
> > 
> > Did you measure the performance impact ?
> 
> I didn't see a noticeable impact on the tests I ran, but those were
> aimed at measuring TLB miss overhead.  I'll need to try it with a
> benchmark that's more oriented around lots of page table updates.

Lmbench's fork test runs about 2% slower with the sync.  I've been told
that nothing relevant has changed since we saw the failure during
emulation; it's probably luck and/or timing, or maybe a sync got added
somewhere else since then?  I think it's only really a problem for
kernel page tables, since user page tables will retry if do_page_fault()
sees a valid PTE.  So maybe we should put an mb() in map_kernel_page()
instead.

-Scott

^ permalink raw reply

* Re: Gianfar driver crashes in Kernel v3.10
From: Scott Wood @ 2013-10-10 21:41 UTC (permalink / raw)
  To: Claudiu Manoil; +Cc: Thomas Hühn, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <52568A7A.6090909@freescale.com>

On Thu, 2013-10-10 at 14:07 +0300, Claudiu Manoil wrote:
> On 10/4/2013 3:28 PM, Thomas H=C3=BChn wrote:
> >
> > [code]
> > [ 2671.841927] Oops: Exception in kernel mode, sig: 5 [#1]
> > [ 2671.847141] Freescale P1014
> > [ 2671.849925] Modules linked in: ath9k pppoe ppp_async iptable_nat
> > ath9k_common pppox p
> > e xt_tcpudp xt_tcpmss xt_string xt_statistic xt_state xt_recent xt_qu=
ota
> > xt_pkttype xt_o
> > mark xt_connbytes xt_comment xt_addrtype xt_TCPMSS xt_REDIRECT xt_NET=
MAP
> > xt_LOG xt_IPMAR
> > ms_datafab ums_cypress ums_alauda slhc nf_nat_tftp nf_nat_snmp_basic
> > nf_nat_sip nf_nat_r
> > ntrack_sip nf_conntrack_rtsp nf_conntrack_proto_gre nf_conntrack_irc
> > nf_conntrack_h323 n
> > compat_xtables compat ath sch_teql sch_tbf sch_sfq sch_red sch_prio
> > sch_htb sch_gred sc
> > skbedit act_mirred em_u32 cls_u32 cls_tcindex cls_flow cls_route cls_=
fw
> > sch_hfsc sch_ing
> > r usb_storage leds_gpio ohci_hcd ehci_platform ehci_hcd sd_mod scsi_m=
od
> > fsl_mph_dr_of gp
> > [ 2671.988946] CPU: 0 PID: 5209 Comm: iftop Not tainted 3.10.13 #2
> > [ 2671.994859] task: c4b22220 ti: c7ff8000 task.ti: c477e000
> > [ 2672.000250] NIP: c018c7a0 LR: c018c794 CTR: c000b070
> > [ 2672.005206] REGS: c7ff9f10 TRAP: 3202   Not tainted  (3.10.13)
> > [ 2672.011028] MSR: 00029000 <CE,EE,ME>  CR: 48000024  XER: 20000000
> > [ 2672.017125]
> > GPR00: 000000ff c477fde0 c4b22220 00000000 00000000 000000ff 00000000
> > 70000000
> > GPR08: ffffffff 00000008 00000000 ffffffff 00000046 10022248 00000000
> > 00000008
> > GPR16: c781b3c0 c781b3c0 000000ff 00000000 00000001 0000021c 00000086
> > fffff800
> > GPR24: c7980300 00000000 00000001 00000040 00000003 c4b33000 00000000
> > 00000001
> > [ 2672.046832] NIP [c018c7a0] gfar_poll+0x424/0x520
> > [ 2672.051442] LR [c018c794] gfar_poll+0x418/0x520
> > [ 2672.055962] Call Trace:
> > [ 2672.058402] [c477fde0] [c018c674] gfar_poll+0x2f8/0x520 (unreliabl=
e)
> > [ 2672.064762] [c477fe80] [c01b0ce8] net_rx_action+0x6c/0x158
> > [ 2672.070249] [c477feb0] [c0027dc4] __do_softirq+0xbc/0x16c
> > [ 2672.075642] [c477ff00] [c0027f7c] irq_exit+0x4c/0x68
> > [ 2672.080604] [c477ff10] [c00041f8] do_IRQ+0xf4/0x10c
> > [ 2672.085478] [c477ff40] [c000ca3c] ret_from_except+0x0/0x18
> > [ 2672.090991] --- Exception: 501 at 0x48083c28
> > [ 2672.090991]     LR =3D 0x48083bf8
> > [ 2672.098378] Instruction dump:
> > [ 2672.101338] 7f8f2040 419cfcc4 80900000 38a00000 8061004c 7e118378
> > 81c10050 7ffafb78
> > [ 2672.109092] 4bf9eaa1 83810034 7c7e1b78 8361003c <83210038> 83a1004=
c
> > 48000060 41a2004c
> > [ 2672.117021] ---[ end trace 565fb54528d305fa ]---
> > [ 2672.121628]
> > [ 2673.103130] Kernel panic - not syncing: Fatal exception in interru=
pt
> > [ 2673.109474] Rebooting in 3 seconds..
> >
> > U-Boot 2010.12-svn15934 (Dec 11 2012 - 16:23:49)
> > [/code]
> >
>=20
> Hi,
>=20
> Does this show up on a half duplex (100Mb/s) link?
> Could you provide following for the gianfar interface, on your setup:
> # ethtool ethX
> and
> # ethtool -d ethX | grep 500
>=20
> Is there any other indication before this Oops? Like a tx timeout WARN?

It's a watchdog interrupt (CPU watchdog, not netdev).  I think it's only
showing up in the gianfar code because that's what's running (unless the
gianfar code is causing the watchdog daemon to not run).

-Scott

^ permalink raw reply

* RE: [PATCH 1/7] powerpc: Add interface to get msi region information
From: Sethi Varun-B16395 @ 2013-10-10 21:13 UTC (permalink / raw)
  To: Bhushan Bharat-R65777, joro@8bytes.org, Bjorn Helgaas
  Cc: agraf@suse.de, Wood Scott-B07421, linux-pci@vger.kernel.org,
	iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org,
	alex.williamson@redhat.com, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <6A3DF150A5B70D4F9B66A25E3F7C888D07196054@039-SN2MPN1-011.039d.mgd.msft.net>



> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org [mailto:linux-kernel-
> owner@vger.kernel.org] On Behalf Of Bhushan Bharat-R65777
> Sent: Tuesday, October 08, 2013 10:40 PM
> To: joro@8bytes.org; Bjorn Helgaas
> Cc: alex.williamson@redhat.com; benh@kernel.crashing.org;
> galak@kernel.crashing.org; linux-kernel@vger.kernel.org; linuxppc-
> dev@lists.ozlabs.org; linux-pci@vger.kernel.org; agraf@suse.de; Wood
> Scott-B07421; iommu@lists.linux-foundation.org
> Subject: RE: [PATCH 1/7] powerpc: Add interface to get msi region
> information
>=20
>=20
>=20
> > -----Original Message-----
> > From: joro@8bytes.org [mailto:joro@8bytes.org]
> > Sent: Tuesday, October 08, 2013 10:32 PM
> > To: Bjorn Helgaas
> > Cc: Bhushan Bharat-R65777; alex.williamson@redhat.com;
> > benh@kernel.crashing.org; galak@kernel.crashing.org;
> > linux-kernel@vger.kernel.org; linuxppc- dev@lists.ozlabs.org;
> > linux-pci@vger.kernel.org; agraf@suse.de; Wood Scott- B07421;
> > iommu@lists.linux-foundation.org
> > Subject: Re: [PATCH 1/7] powerpc: Add interface to get msi region
> > information
> >
> > On Tue, Oct 08, 2013 at 10:47:49AM -0600, Bjorn Helgaas wrote:
> > > I still have no idea what an "aperture type IOMMU" is, other than
> > > that it is "different."
> >
> > An aperture based IOMMU is basically any GART-like IOMMU which can
> > only remap a small window (the aperture) of the DMA address space. DMA
> > outside of that window is either blocked completly or passed through
> untranslated.
>=20
> It is completely blocked for Freescale PAMU.
> So for this type of iommu what we have to do is to create a MSI mapping
> just after guest physical address, Example: guest have a 512M of memory
> then we create window of 1G (because of power of 2 requirement), then we
> have to FIT MSI just after 512M of guest.

[Sethi Varun-B16395] PAMU (FSL IOMMU) has a concept of primary window and s=
ubwindows. Primary window corresponds to the complete guest iova address sp=
ace (including MSI space), with respect to IOMMU_API this is termed as geom=
etry . IOVA Base of subwindow is determined from the number of subwindows (=
configurable using iommu API). Subwindows allow for handling physically dis=
contiguous memory. PAMU translates device iova accesses to actual physical =
address. MSI mapping would be addressed by a subwindow, with iova base star=
ting at the end of the guest iova space.=20

VFIO code creates a PAMU  window (also defines number of subwindow) to map =
the guest iova space + msi space. The interface defined by this patch queri=
es the PAMU driver to get the iova mapping for the msi region assigned to t=
he PCIe device (assigned to the guest).

-Varun

^ permalink raw reply

* Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device
From: Alex Williamson @ 2013-10-10 20:41 UTC (permalink / raw)
  To: Sethi Varun-B16395
  Cc: Wood Scott-B07421, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, agraf@suse.de,
	iommu@lists.linux-foundation.org, Bhushan Bharat-R65777,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <C5ECD7A89D1DC44195F34B25E172658D0A52BAC7@039-SN2MPN1-013.039d.mgd.msft.net>

On Thu, 2013-10-10 at 20:09 +0000, Sethi Varun-B16395 wrote:
> 
> > -----Original Message-----
> > From: iommu-bounces@lists.linux-foundation.org [mailto:iommu-
> > bounces@lists.linux-foundation.org] On Behalf Of Alex Williamson
> > Sent: Tuesday, October 08, 2013 8:43 AM
> > To: Bhushan Bharat-R65777
> > Cc: agraf@suse.de; Wood Scott-B07421; linux-pci@vger.kernel.org;
> > galak@kernel.crashing.org; linux-kernel@vger.kernel.org;
> > iommu@lists.linux-foundation.org; benh@kernel.crashing.org; linuxppc-
> > dev@lists.ozlabs.org
> > Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device
> > 
> > On Mon, 2013-10-07 at 05:46 +0000, Bhushan Bharat-R65777 wrote:
> > >
> > > > -----Original Message-----
> > > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > > Sent: Friday, October 04, 2013 11:42 PM
> > > > To: Bhushan Bharat-R65777
> > > > Cc: joro@8bytes.org; benh@kernel.crashing.org;
> > > > galak@kernel.crashing.org; linux- kernel@vger.kernel.org;
> > > > linuxppc-dev@lists.ozlabs.org; linux- pci@vger.kernel.org;
> > > > agraf@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org
> > > > Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a
> > > > device
> > > >
> > > > On Fri, 2013-10-04 at 17:23 +0000, Bhushan Bharat-R65777 wrote:
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > > > > Sent: Friday, October 04, 2013 10:43 PM
> > > > > > To: Bhushan Bharat-R65777
> > > > > > Cc: joro@8bytes.org; benh@kernel.crashing.org;
> > > > > > galak@kernel.crashing.org; linux- kernel@vger.kernel.org;
> > > > > > linuxppc-dev@lists.ozlabs.org; linux- pci@vger.kernel.org;
> > > > > > agraf@suse.de; Wood Scott-B07421; iommu@lists.linux-
> > > > > > foundation.org
> > > > > > Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a
> > > > > > device
> > > > > >
> > > > > > On Fri, 2013-10-04 at 16:47 +0000, Bhushan Bharat-R65777 wrote:
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > > > > > > Sent: Friday, October 04, 2013 9:15 PM
> > > > > > > > To: Bhushan Bharat-R65777
> > > > > > > > Cc: joro@8bytes.org; benh@kernel.crashing.org;
> > > > > > > > galak@kernel.crashing.org; linux- kernel@vger.kernel.org;
> > > > > > > > linuxppc-dev@lists.ozlabs.org; linux- pci@vger.kernel.org;
> > > > > > > > agraf@suse.de; Wood Scott-B07421; iommu@lists.linux-
> > > > > > > > foundation.org
> > > > > > > > Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain
> > > > > > > > of a device
> > > > > > > >
> > > > > > > > On Fri, 2013-10-04 at 09:54 +0000, Bhushan Bharat-R65777
> > wrote:
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: linux-pci-owner@vger.kernel.org
> > > > > > > > > > [mailto:linux-pci-owner@vger.kernel.org]
> > > > > > > > > > On Behalf Of Alex Williamson
> > > > > > > > > > Sent: Wednesday, September 25, 2013 10:16 PM
> > > > > > > > > > To: Bhushan Bharat-R65777
> > > > > > > > > > Cc: joro@8bytes.org; benh@kernel.crashing.org;
> > > > > > > > > > galak@kernel.crashing.org; linux-
> > > > > > > > > > kernel@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> > > > > > > > > > linux- pci@vger.kernel.org; agraf@suse.de; Wood
> > > > > > > > > > Scott-B07421; iommu@lists.linux- foundation.org; Bhushan
> > > > > > > > > > Bharat-R65777
> > > > > > > > > > Subject: Re: [PATCH 2/7] iommu: add api to get
> > > > > > > > > > iommu_domain of a device
> > > > > > > > > >
> > > > > > > > > > On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
> > > > > > > > > > > This api return the iommu domain to which the device is
> > attached.
> > > > > > > > > > > The iommu_domain is required for making API calls
> > > > > > > > > > > related to
> > > > iommu.
> > > > > > > > > > > Follow up patches which use this API to know iommu
> > maping.
> > > > > > > > > > >
> > > > > > > > > > > Signed-off-by: Bharat Bhushan
> > > > > > > > > > > <bharat.bhushan@freescale.com>
> > > > > > > > > > > ---
> > > > > > > > > > >  drivers/iommu/iommu.c |   10 ++++++++++
> > > > > > > > > > >  include/linux/iommu.h |    7 +++++++
> > > > > > > > > > >  2 files changed, 17 insertions(+), 0 deletions(-)
> > > > > > > > > > >
> > > > > > > > > > > diff --git a/drivers/iommu/iommu.c
> > > > > > > > > > > b/drivers/iommu/iommu.c index
> > > > > > > > > > > fbe9ca7..6ac5f50 100644
> > > > > > > > > > > --- a/drivers/iommu/iommu.c
> > > > > > > > > > > +++ b/drivers/iommu/iommu.c
> > > > > > > > > > > @@ -696,6 +696,16 @@ void iommu_detach_device(struct
> > > > > > > > > > > iommu_domain *domain, struct device *dev)  }
> > > > > > > > > > > EXPORT_SYMBOL_GPL(iommu_detach_device);
> > > > > > > > > > >
> > > > > > > > > > > +struct iommu_domain *iommu_get_dev_domain(struct
> > device *dev) {
> > > > > > > > > > > +	struct iommu_ops *ops = dev->bus->iommu_ops;
> > > > > > > > > > > +
> > > > > > > > > > > +	if (unlikely(ops == NULL ||
> > > > > > > > > > > +ops->get_dev_iommu_domain ==
> > > > NULL))
> > > > > > > > > > > +		return NULL;
> > > > > > > > > > > +
> > > > > > > > > > > +	return ops->get_dev_iommu_domain(dev); }
> > > > > > > > > > > +EXPORT_SYMBOL_GPL(iommu_get_dev_domain);
> > > > > > > > > >
> > > > > > > > > > What prevents this from racing iommu_domain_free()?
> > > > > > > > > > There's no references acquired, so there's no reason for
> > > > > > > > > > the caller to assume the
> > > > > > > > pointer is valid.
> > > > > > > > >
> > > > > > > > > Sorry for late query, somehow this email went into a
> > > > > > > > > folder and escaped;
> > > > > > > > >
> > > > > > > > > Just to be sure, there is not lock at generic "struct
> > > > > > > > > iommu_domain", but IP
> > > > > > > > specific structure (link FSL domain) linked in
> > > > > > > > iommu_domain->priv have a lock, so we need to ensure this
> > > > > > > > race in FSL iommu code (say drivers/iommu/fsl_pamu_domain.c),
> > right?
> > > > > > > >
> > > > > > > > No, it's not sufficient to make sure that your use of the
> > > > > > > > interface is race free.  The interface itself needs to be
> > > > > > > > designed so that it's difficult to use incorrectly.
> > > > > > >
> > > > > > > So we can define
> > > > > > > iommu_get_dev_domain()/iommu_put_dev_domain();
> > > > > > > iommu_get_dev_domain() will return domain with the lock held,
> > > > > > > and
> > > > > > > iommu_put_dev_domain() will release the lock? And
> > > > > > > iommu_get_dev_domain() must always be followed by
> > > > > > > iommu_get_dev_domain().
> > > > > >
> > > > > > What lock?  get/put are generally used for reference counting,
> > > > > > not locking in the kernel.
> > > > > >
> > > > > > > > That's not the case here.  This is a backdoor to get the
> > > > > > > > iommu domain from the iommu driver regardless of who is using
> > it or how.
> > > > > > > > The iommu domain is created and managed by vfio, so
> > > > > > > > shouldn't we be looking at how to do this through vfio?
> > > > > > >
> > > > > > > Let me first describe what we are doing here:
> > > > > > > During initialization:-
> > > > > > >  - vfio talks to MSI system to know the MSI-page and size
> > > > > > >  - vfio then interacts with iommu to map the MSI-page in iommu
> > > > > > > (IOVA is decided by userspace and physical address is the
> > > > > > > MSI-page)
> > > > > > >  - So the IOVA subwindow mapping is created in iommu and yes
> > > > > > > VFIO know about
> > > > > > this mapping.
> > > > > > >
> > > > > > > Now do SET_IRQ(MSI/MSIX) ioctl:
> > > > > > >  - calls pci_enable_msix()/pci_enable_msi_block(): which is
> > > > > > > supposed to set
> > > > > > MSI address/data in device.
> > > > > > >  - So in current implementation (this patchset) msi-subsystem
> > > > > > > gets the IOVA
> > > > > > from iommu via this defined interface.
> > > > > > >  - Are you saying that rather than getting this from iommu, we
> > > > > > > should get this
> > > > > > from vfio? What difference does this make?
> > > > > >
> > > > > > Yes, you just said above that vfio knows the msi to iova
> > > > > > mapping, so why go outside of vfio to find it later?  The
> > > > > > difference is one case you can have a proper reference to data
> > > > > > structures to make sure the pointer you get back actually has
> > > > > > meaning at the time you're using it vs the code here where
> > > > > > you're defining an API that returns a meaningless value
> > > > >
> > > > > With FSL-PAMU we will always get consistant data from iommu or
> > > > > vfio-data
> > > > structure.
> > > >
> > > > Great, but you're trying to add a generic API to the IOMMU subsystem
> > > > that's difficult to use correctly.  The fact that you use it
> > > > correctly does not justify the API.
> > > >
> > > > > > because you can't check or
> > > > > > enforce that an arbitrary caller is using it correctly.
> > > > >
> > > > > I am not sure what is arbitrary caller? pdev is known to vfio, so
> > > > > vfio will only make pci_enable_msix()/pci_enable_msi_block() for
> > this pdev.
> > > > > If anyother code makes then it is some other unexpectedly thing
> > > > > happening in system, no?
> > > >
> > > > What's proposed here is a generic IOMMU API.  Anybody can call this.
> > > > What if the host SCSI driver decides to go get the iommu domain for
> > > > it's device (or any other device)?  Does that fit your usage model?
> > > >
> > > > > >  It's not maintainable.
> > > > > > Thanks,
> > > > >
> > > > > I do not have any issue with this as well, can you also describe
> > > > > the type of API you are envisioning; I can think of defining some
> > > > > function in vfio.c/vfio_iommu*.c, make them global and declare
> > > > > then in include/Linux/vfio.h And include <Linux/vfio.h> in caller
> > > > > file
> > > > > (arch/powerpc/kernel/msi.c)
> > > >
> > > > Do you really want module dependencies between vfio and your core
> > > > kernel MSI setup?  Look at the vfio external user interface that
> > we've already defined.
> > > > That allows other components of the kernel to get a proper reference
> > > > to a vfio group.  From there you can work out how to get what you
> > > > want.  Another alternative is that vfio could register an MSI to
> > > > IOVA mapping with architecture code when the mapping is created.
> > > > The MSI setup path could then do a lookup in architecture code for
> > > > the mapping.  You could even store the MSI to IOVA mapping in VFIO
> > > > and create an interface where SET_IRQ passes that mapping into setup
> > code.
> [Sethi Varun-B16395] What you are suggesting is that the MSI setup
> path queries the vfio subsystem for the mapping, rather than directly
> querying the iommu subsystem. So, say if we add an interface for
> getting MSI to IOVA mapping in the msi setup path, wouldn't this again
> be specific to vfio? I reckon that this interface would again ppc
> machine specific interface.

Sure, if this is a generic MSI setup path then clearly vfio should not
be involved.  But by that same notion, if this is a generic MSI setup
path, how can the proposed solutions guarantee that the iommu_domain or
iova returned is still valid in all cases?  It's racy.  If the caller
trying to setup MSI has the information needed, why doesn't it pass it
in as part of the setup?  Thanks,

Alex

^ permalink raw reply

* [PATCH] powerpc: fix e500 SPE float SIGFPE generation
From: Joseph S. Myers @ 2013-10-10 20:29 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: linux-kernel

From: Joseph Myers <joseph@codesourcery.com>

The e500 SPE floating-point emulation code is called from
SPEFloatingPointException and SPEFloatingPointRoundException in
arch/powerpc/kernel/traps.c.  Those functions have support for
generating SIGFPE, but do_spe_mathemu and speround_handler don't
generate a return value to indicate that this should be done.  Such a
return value should depend on whether an exception is raised that has
been set via prctl to generate SIGFPE.  This patch adds the relevant
logic in these functions so that SIGFPE is generated as expected by
the glibc testsuite.

Signed-off-by: Joseph Myers <joseph@codesourcery.com>

---

This patch is not intended to depend on any of my previous patches
<http://lkml.org/lkml/2013/10/4/495>,
<http://lkml.org/lkml/2013/10/4/497>,
<http://lkml.org/lkml/2013/10/8/694>,
<http://lkml.org/lkml/2013/10/8/700> and
<http://lkml.org/lkml/2013/10/8/705>, although testing has been on top
of that patch series and having all six patches will produce the best
results.

diff --git a/arch/powerpc/math-emu/math_efp.c b/arch/powerpc/math-emu/math_efp.c
index 01a0abb..28337c9 100644
--- a/arch/powerpc/math-emu/math_efp.c
+++ b/arch/powerpc/math-emu/math_efp.c
@@ -20,6 +20,7 @@
  */

 #include <linux/types.h>
+#include <linux/prctl.h>

 #include <asm/uaccess.h>
 #include <asm/reg.h>
@@ -691,6 +692,23 @@ update_regs:
 	pr_debug("va: %08x  %08x\n", va.wp[0], va.wp[1]);
 	pr_debug("vb: %08x  %08x\n", vb.wp[0], vb.wp[1]);

+	if (current->thread.fpexc_mode & PR_FP_EXC_SW_ENABLE) {
+		if ((FP_CUR_EXCEPTIONS & FP_EX_DIVZERO)
+		    && (current->thread.fpexc_mode & PR_FP_EXC_DIV))
+			return 1;
+		if ((FP_CUR_EXCEPTIONS & FP_EX_OVERFLOW)
+		    && (current->thread.fpexc_mode & PR_FP_EXC_OVF))
+			return 1;
+		if ((FP_CUR_EXCEPTIONS & FP_EX_UNDERFLOW)
+		    && (current->thread.fpexc_mode & PR_FP_EXC_UND))
+			return 1;
+		if ((FP_CUR_EXCEPTIONS & FP_EX_INEXACT)
+		    && (current->thread.fpexc_mode & PR_FP_EXC_RES))
+			return 1;
+		if ((FP_CUR_EXCEPTIONS & FP_EX_INVALID)
+		    && (current->thread.fpexc_mode & PR_FP_EXC_INV))
+			return 1;
+	}
 	return 0;

 illegal:
@@ -867,6 +885,8 @@ int speround_handler(struct pt_regs *regs)

 	pr_debug("  to fgpr: %08x  %08x\n", fgpr.wp[0], fgpr.wp[1]);

+	if (current->thread.fpexc_mode & PR_FP_EXC_SW_ENABLE)
+		return (current->thread.fpexc_mode & PR_FP_EXC_RES) ? 1 : 0;
 	return 0;
 }

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply related

* RE: [PATCH 2/7] iommu: add api to get iommu_domain of a device
From: Sethi Varun-B16395 @ 2013-10-10 20:09 UTC (permalink / raw)
  To: Alex Williamson, Bhushan Bharat-R65777
  Cc: Wood Scott-B07421, linux-pci@vger.kernel.org,
	linux-kernel@vger.kernel.org, agraf@suse.de,
	iommu@lists.linux-foundation.org, linuxppc-dev@lists.ozlabs.org
In-Reply-To: <1381201980.6330.4.camel@ul30vt.home>



> -----Original Message-----
> From: iommu-bounces@lists.linux-foundation.org [mailto:iommu-
> bounces@lists.linux-foundation.org] On Behalf Of Alex Williamson
> Sent: Tuesday, October 08, 2013 8:43 AM
> To: Bhushan Bharat-R65777
> Cc: agraf@suse.de; Wood Scott-B07421; linux-pci@vger.kernel.org;
> galak@kernel.crashing.org; linux-kernel@vger.kernel.org;
> iommu@lists.linux-foundation.org; benh@kernel.crashing.org; linuxppc-
> dev@lists.ozlabs.org
> Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a device
>=20
> On Mon, 2013-10-07 at 05:46 +0000, Bhushan Bharat-R65777 wrote:
> >
> > > -----Original Message-----
> > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > Sent: Friday, October 04, 2013 11:42 PM
> > > To: Bhushan Bharat-R65777
> > > Cc: joro@8bytes.org; benh@kernel.crashing.org;
> > > galak@kernel.crashing.org; linux- kernel@vger.kernel.org;
> > > linuxppc-dev@lists.ozlabs.org; linux- pci@vger.kernel.org;
> > > agraf@suse.de; Wood Scott-B07421; iommu@lists.linux- foundation.org
> > > Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a
> > > device
> > >
> > > On Fri, 2013-10-04 at 17:23 +0000, Bhushan Bharat-R65777 wrote:
> > > >
> > > > > -----Original Message-----
> > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > > > Sent: Friday, October 04, 2013 10:43 PM
> > > > > To: Bhushan Bharat-R65777
> > > > > Cc: joro@8bytes.org; benh@kernel.crashing.org;
> > > > > galak@kernel.crashing.org; linux- kernel@vger.kernel.org;
> > > > > linuxppc-dev@lists.ozlabs.org; linux- pci@vger.kernel.org;
> > > > > agraf@suse.de; Wood Scott-B07421; iommu@lists.linux-
> > > > > foundation.org
> > > > > Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain of a
> > > > > device
> > > > >
> > > > > On Fri, 2013-10-04 at 16:47 +0000, Bhushan Bharat-R65777 wrote:
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Alex Williamson [mailto:alex.williamson@redhat.com]
> > > > > > > Sent: Friday, October 04, 2013 9:15 PM
> > > > > > > To: Bhushan Bharat-R65777
> > > > > > > Cc: joro@8bytes.org; benh@kernel.crashing.org;
> > > > > > > galak@kernel.crashing.org; linux- kernel@vger.kernel.org;
> > > > > > > linuxppc-dev@lists.ozlabs.org; linux- pci@vger.kernel.org;
> > > > > > > agraf@suse.de; Wood Scott-B07421; iommu@lists.linux-
> > > > > > > foundation.org
> > > > > > > Subject: Re: [PATCH 2/7] iommu: add api to get iommu_domain
> > > > > > > of a device
> > > > > > >
> > > > > > > On Fri, 2013-10-04 at 09:54 +0000, Bhushan Bharat-R65777
> wrote:
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: linux-pci-owner@vger.kernel.org
> > > > > > > > > [mailto:linux-pci-owner@vger.kernel.org]
> > > > > > > > > On Behalf Of Alex Williamson
> > > > > > > > > Sent: Wednesday, September 25, 2013 10:16 PM
> > > > > > > > > To: Bhushan Bharat-R65777
> > > > > > > > > Cc: joro@8bytes.org; benh@kernel.crashing.org;
> > > > > > > > > galak@kernel.crashing.org; linux-
> > > > > > > > > kernel@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> > > > > > > > > linux- pci@vger.kernel.org; agraf@suse.de; Wood
> > > > > > > > > Scott-B07421; iommu@lists.linux- foundation.org; Bhushan
> > > > > > > > > Bharat-R65777
> > > > > > > > > Subject: Re: [PATCH 2/7] iommu: add api to get
> > > > > > > > > iommu_domain of a device
> > > > > > > > >
> > > > > > > > > On Thu, 2013-09-19 at 12:59 +0530, Bharat Bhushan wrote:
> > > > > > > > > > This api return the iommu domain to which the device is
> attached.
> > > > > > > > > > The iommu_domain is required for making API calls
> > > > > > > > > > related to
> > > iommu.
> > > > > > > > > > Follow up patches which use this API to know iommu
> maping.
> > > > > > > > > >
> > > > > > > > > > Signed-off-by: Bharat Bhushan
> > > > > > > > > > <bharat.bhushan@freescale.com>
> > > > > > > > > > ---
> > > > > > > > > >  drivers/iommu/iommu.c |   10 ++++++++++
> > > > > > > > > >  include/linux/iommu.h |    7 +++++++
> > > > > > > > > >  2 files changed, 17 insertions(+), 0 deletions(-)
> > > > > > > > > >
> > > > > > > > > > diff --git a/drivers/iommu/iommu.c
> > > > > > > > > > b/drivers/iommu/iommu.c index
> > > > > > > > > > fbe9ca7..6ac5f50 100644
> > > > > > > > > > --- a/drivers/iommu/iommu.c
> > > > > > > > > > +++ b/drivers/iommu/iommu.c
> > > > > > > > > > @@ -696,6 +696,16 @@ void iommu_detach_device(struct
> > > > > > > > > > iommu_domain *domain, struct device *dev)  }
> > > > > > > > > > EXPORT_SYMBOL_GPL(iommu_detach_device);
> > > > > > > > > >
> > > > > > > > > > +struct iommu_domain *iommu_get_dev_domain(struct
> device *dev) {
> > > > > > > > > > +	struct iommu_ops *ops =3D dev->bus->iommu_ops;
> > > > > > > > > > +
> > > > > > > > > > +	if (unlikely(ops =3D=3D NULL ||
> > > > > > > > > > +ops->get_dev_iommu_domain =3D=3D
> > > NULL))
> > > > > > > > > > +		return NULL;
> > > > > > > > > > +
> > > > > > > > > > +	return ops->get_dev_iommu_domain(dev); }
> > > > > > > > > > +EXPORT_SYMBOL_GPL(iommu_get_dev_domain);
> > > > > > > > >
> > > > > > > > > What prevents this from racing iommu_domain_free()?
> > > > > > > > > There's no references acquired, so there's no reason for
> > > > > > > > > the caller to assume the
> > > > > > > pointer is valid.
> > > > > > > >
> > > > > > > > Sorry for late query, somehow this email went into a
> > > > > > > > folder and escaped;
> > > > > > > >
> > > > > > > > Just to be sure, there is not lock at generic "struct
> > > > > > > > iommu_domain", but IP
> > > > > > > specific structure (link FSL domain) linked in
> > > > > > > iommu_domain->priv have a lock, so we need to ensure this
> > > > > > > race in FSL iommu code (say drivers/iommu/fsl_pamu_domain.c),
> right?
> > > > > > >
> > > > > > > No, it's not sufficient to make sure that your use of the
> > > > > > > interface is race free.  The interface itself needs to be
> > > > > > > designed so that it's difficult to use incorrectly.
> > > > > >
> > > > > > So we can define
> > > > > > iommu_get_dev_domain()/iommu_put_dev_domain();
> > > > > > iommu_get_dev_domain() will return domain with the lock held,
> > > > > > and
> > > > > > iommu_put_dev_domain() will release the lock? And
> > > > > > iommu_get_dev_domain() must always be followed by
> > > > > > iommu_get_dev_domain().
> > > > >
> > > > > What lock?  get/put are generally used for reference counting,
> > > > > not locking in the kernel.
> > > > >
> > > > > > > That's not the case here.  This is a backdoor to get the
> > > > > > > iommu domain from the iommu driver regardless of who is using
> it or how.
> > > > > > > The iommu domain is created and managed by vfio, so
> > > > > > > shouldn't we be looking at how to do this through vfio?
> > > > > >
> > > > > > Let me first describe what we are doing here:
> > > > > > During initialization:-
> > > > > >  - vfio talks to MSI system to know the MSI-page and size
> > > > > >  - vfio then interacts with iommu to map the MSI-page in iommu
> > > > > > (IOVA is decided by userspace and physical address is the
> > > > > > MSI-page)
> > > > > >  - So the IOVA subwindow mapping is created in iommu and yes
> > > > > > VFIO know about
> > > > > this mapping.
> > > > > >
> > > > > > Now do SET_IRQ(MSI/MSIX) ioctl:
> > > > > >  - calls pci_enable_msix()/pci_enable_msi_block(): which is
> > > > > > supposed to set
> > > > > MSI address/data in device.
> > > > > >  - So in current implementation (this patchset) msi-subsystem
> > > > > > gets the IOVA
> > > > > from iommu via this defined interface.
> > > > > >  - Are you saying that rather than getting this from iommu, we
> > > > > > should get this
> > > > > from vfio? What difference does this make?
> > > > >
> > > > > Yes, you just said above that vfio knows the msi to iova
> > > > > mapping, so why go outside of vfio to find it later?  The
> > > > > difference is one case you can have a proper reference to data
> > > > > structures to make sure the pointer you get back actually has
> > > > > meaning at the time you're using it vs the code here where
> > > > > you're defining an API that returns a meaningless value
> > > >
> > > > With FSL-PAMU we will always get consistant data from iommu or
> > > > vfio-data
> > > structure.
> > >
> > > Great, but you're trying to add a generic API to the IOMMU subsystem
> > > that's difficult to use correctly.  The fact that you use it
> > > correctly does not justify the API.
> > >
> > > > > because you can't check or
> > > > > enforce that an arbitrary caller is using it correctly.
> > > >
> > > > I am not sure what is arbitrary caller? pdev is known to vfio, so
> > > > vfio will only make pci_enable_msix()/pci_enable_msi_block() for
> this pdev.
> > > > If anyother code makes then it is some other unexpectedly thing
> > > > happening in system, no?
> > >
> > > What's proposed here is a generic IOMMU API.  Anybody can call this.
> > > What if the host SCSI driver decides to go get the iommu domain for
> > > it's device (or any other device)?  Does that fit your usage model?
> > >
> > > > >  It's not maintainable.
> > > > > Thanks,
> > > >
> > > > I do not have any issue with this as well, can you also describe
> > > > the type of API you are envisioning; I can think of defining some
> > > > function in vfio.c/vfio_iommu*.c, make them global and declare
> > > > then in include/Linux/vfio.h And include <Linux/vfio.h> in caller
> > > > file
> > > > (arch/powerpc/kernel/msi.c)
> > >
> > > Do you really want module dependencies between vfio and your core
> > > kernel MSI setup?  Look at the vfio external user interface that
> we've already defined.
> > > That allows other components of the kernel to get a proper reference
> > > to a vfio group.  From there you can work out how to get what you
> > > want.  Another alternative is that vfio could register an MSI to
> > > IOVA mapping with architecture code when the mapping is created.
> > > The MSI setup path could then do a lookup in architecture code for
> > > the mapping.  You could even store the MSI to IOVA mapping in VFIO
> > > and create an interface where SET_IRQ passes that mapping into setup
> code.
[Sethi Varun-B16395] What you are suggesting is that the MSI setup path que=
ries the vfio subsystem for the mapping, rather than directly querying the =
iommu subsystem. So, say if we add an interface for getting MSI to IOVA map=
ping in the msi setup path, wouldn't this again be specific to vfio? I reck=
on that this interface would again ppc machine specific interface.

-Varun

^ permalink raw reply

* Re: [v1] powerpc/mpc512x: silence build warning upon disabled DIU
From: Anatolij Gustschin @ 2013-10-10 19:37 UTC (permalink / raw)
  To: Brian Norris; +Cc: Gerhard Sittig, linuxppc-dev
In-Reply-To: <CAN8TOE8o-34zFP=0q3NjondqpA6UA1szitN+kgKcbaNOWhrscg@mail.gmail.com>

Hello,

On Thu, 10 Oct 2013 11:23:55 -0700
Brian Norris <computersforpeace@gmail.com> wrote:
...
> > making mpc512x_setup_diu(), mpc512x_release_bootmem(),
> > mpc512x_valid_monitor_port() and void mpc512x_set_pixel_clock()
> > should be okay.
> 
> And mpc512x_init_diu()?

yes, it can be static, too.

> 
> >> Then, you can get the real benefit of IS_ENABLED() by removing the
> >>
> >>     #if IS_ENABLED(CONFIG_FB_FSL_DIU)
> >>
> >> from around all the DIU code, and it will automatically be removed by
> >> the compiler when it is not used.
> >>
> >> I think the current patch is necessary for immediate use, and it can be
> >> sent to stable. But I might suggest a follow-up patch or 2 that makes
> >> the functions static and kills the #ifdef entirely.
> >
> > Yes, we also have to remove CONFIG_FB_FSL_DIU ifdef in
> > arch/powerpc/sysdev/fsl_soc.h and building should work then.
> 
> Still want it around this line, probably, so we'll get compiler errors
> and not linker errors if someone tries to use it?
> 
>     extern struct platform_diu_data_ops diu_ops;
> 
> But otherwise that looks OK to me. Shall I send a patch?

yes, please.

Thanks,

Anatolij

^ permalink raw reply

* Re: [v1] powerpc/mpc512x: silence build warning upon disabled DIU
From: Brian Norris @ 2013-10-10 18:23 UTC (permalink / raw)
  To: Anatolij Gustschin; +Cc: Gerhard Sittig, linuxppc-dev
In-Reply-To: <20131010180948.4350ebda@crub>

Hello,

On Thu, Oct 10, 2013 at 9:09 AM, Anatolij Gustschin <agust@denx.de> wrote:
> On Wed, 9 Oct 2013 12:29:31 -0700
> Brian Norris <computersforpeace@gmail.com> wrote:
> ...
>> > +#else
>> > +void __init mpc512x_setup_diu(void) { /* EMPTY */ }
>> > +void __init mpc512x_init_diu(void) { /* EMPTY */ }
>> >  #endif
>> >
>> >  void __init mpc512x_init_IRQ(void)
>>
>> I see an alternative solution:
>>
>> Can't almost all of the code in mpc512x_shared.c be declared 'static'?
>
> making mpc512x_setup_diu(), mpc512x_release_bootmem(),
> mpc512x_valid_monitor_port() and void mpc512x_set_pixel_clock()
> should be okay.

And mpc512x_init_diu()?

>> Then, you can get the real benefit of IS_ENABLED() by removing the
>>
>>     #if IS_ENABLED(CONFIG_FB_FSL_DIU)
>>
>> from around all the DIU code, and it will automatically be removed by
>> the compiler when it is not used.
>>
>> I think the current patch is necessary for immediate use, and it can be
>> sent to stable. But I might suggest a follow-up patch or 2 that makes
>> the functions static and kills the #ifdef entirely.
>
> Yes, we also have to remove CONFIG_FB_FSL_DIU ifdef in
> arch/powerpc/sysdev/fsl_soc.h and building should work then.

Still want it around this line, probably, so we'll get compiler errors
and not linker errors if someone tries to use it?

    extern struct platform_diu_data_ops diu_ops;

But otherwise that looks OK to me. Shall I send a patch?

Brian

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: Alexander Gordeev @ 2013-10-10 18:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-mips, VMware, Inc., linux-nvme, linux-ide, linux-s390,
	Andy King, linux-scsi, linux-rdma, x86, Ingo Molnar, linux-pci,
	iss_storagedev, linux-driver, Tejun Heo, Bjorn Helgaas,
	Dan Williams, Jon Mason, Solarflare linux maintainers, netdev,
	linux-kernel, Ralf Baechle, e1000-devel, Martin Schwidefsky,
	linux390, linuxppc-dev
In-Reply-To: <5256D5AB.4050105@zytor.com>

On Thu, Oct 10, 2013 at 09:28:27AM -0700, H. Peter Anvin wrote:
> On 10/10/2013 03:17 AM, Alexander Gordeev wrote:
> > On Wed, Oct 09, 2013 at 03:24:08PM +1100, Benjamin Herrenschmidt wrote:
> > 
> > Ok, this suggestion sounded in one or another form by several people.
> > What about name it pcim_enable_msix_range() and wrap in couple more
> > helpers to complete an API:
> > 
> > int pcim_enable_msix_range(pdev, msix_entries, nvec, minvec);
> > 	<0 - error code
> > 	>0 - number of MSIs allocated, where minvec >= result <= nvec
> > 
> > int pcim_enable_msix(pdev, msix_entries, nvec);
> > 	<0 - error code
> > 	>0 - number of MSIs allocated, where 1 >= result <= nvec 
> > 
> > int pcim_enable_msix_exact(pdev, msix_entries, nvec);
> > 	<0 - error code
> > 	>0 - number of MSIs allocated, where result == nvec
> > 
> > The latter's return value seems odd, but I can not help to make
> > it consistent with the first two.
> > 
> 
> Is there a reason for the wrappers, as opposed to just specifying either
> 1 or nvec as the minimum?

The wrappers are more handy IMO.

I.e. can imagine people start struggling to figure out what minvec to provide:
1 or 0? Why 1? Oh.. okay.. Or should we tolerate 0 (as opposite to -ERANGE)?

Well, do not know.. pcim_enable_msix(pdev, msix_entries, nvec, nvec) is 
less readable for me than just pcim_enable_msix_exact(pdev, msix_entries,
nvec).

> 	-hpa

-- 
Regards,
Alexander Gordeev
agordeev@redhat.com

^ permalink raw reply

* Re: [PATCH RFC 00/77] Re-design MSI/MSI-X interrupts enablement pattern
From: H. Peter Anvin @ 2013-10-10 16:28 UTC (permalink / raw)
  To: Alexander Gordeev
  Cc: linux-mips, VMware, Inc., linux-nvme, linux-ide, linux-s390,
	Andy King, linux-scsi, linux-rdma, x86, Ingo Molnar, linux-pci,
	iss_storagedev, linux-driver, Tejun Heo, Bjorn Helgaas,
	Dan Williams, Jon Mason, Solarflare linux maintainers, netdev,
	linux-kernel, Ralf Baechle, e1000-devel, Martin Schwidefsky,
	linux390, linuxppc-dev
In-Reply-To: <20131010101704.GC11874@dhcp-26-207.brq.redhat.com>

On 10/10/2013 03:17 AM, Alexander Gordeev wrote:
> On Wed, Oct 09, 2013 at 03:24:08PM +1100, Benjamin Herrenschmidt wrote:
> 
> Ok, this suggestion sounded in one or another form by several people.
> What about name it pcim_enable_msix_range() and wrap in couple more
> helpers to complete an API:
> 
> int pcim_enable_msix_range(pdev, msix_entries, nvec, minvec);
> 	<0 - error code
> 	>0 - number of MSIs allocated, where minvec >= result <= nvec
> 
> int pcim_enable_msix(pdev, msix_entries, nvec);
> 	<0 - error code
> 	>0 - number of MSIs allocated, where 1 >= result <= nvec 
> 
> int pcim_enable_msix_exact(pdev, msix_entries, nvec);
> 	<0 - error code
> 	>0 - number of MSIs allocated, where result == nvec
> 
> The latter's return value seems odd, but I can not help to make
> it consistent with the first two.
> 

Is there a reason for the wrappers, as opposed to just specifying either
1 or nvec as the minimum?

	-hpa

^ permalink raw reply

* Re: [v1] powerpc/mpc512x: silence build warning upon disabled DIU
From: Anatolij Gustschin @ 2013-10-10 16:09 UTC (permalink / raw)
  To: Brian Norris; +Cc: Gerhard Sittig, linuxppc-dev
In-Reply-To: <20131009192931.GC23337@ld-irv-0074.broadcom.com>

Hi,

On Wed, 9 Oct 2013 12:29:31 -0700
Brian Norris <computersforpeace@gmail.com> wrote:
...
> > +#else
> > +void __init mpc512x_setup_diu(void) { /* EMPTY */ }
> > +void __init mpc512x_init_diu(void) { /* EMPTY */ }
> >  #endif
> >  
> >  void __init mpc512x_init_IRQ(void)
> 
> I see an alternative solution:
> 
> Can't almost all of the code in mpc512x_shared.c be declared 'static'?

making mpc512x_setup_diu(), mpc512x_release_bootmem(),
mpc512x_valid_monitor_port() and void mpc512x_set_pixel_clock()
should be okay.

> Then, you can get the real benefit of IS_ENABLED() by removing the
> 
>     #if IS_ENABLED(CONFIG_FB_FSL_DIU)
> 
> from around all the DIU code, and it will automatically be removed by
> the compiler when it is not used.
> 
> I think the current patch is necessary for immediate use, and it can be
> sent to stable. But I might suggest a follow-up patch or 2 that makes
> the functions static and kills the #ifdef entirely.

Yes, we also have to remove CONFIG_FB_FSL_DIU ifdef in
arch/powerpc/sysdev/fsl_soc.h and building should work then.

Thanks,

Anatolij

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox