LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [v3][PATCH 2/8] powerpc/book3e: support CONFIG_RELOCATABLE
From: Scott Wood @ 2013-12-18  3:29 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1373357007-30785-3-git-send-email-tiejun.chen@windriver.com>

On Tue, 2013-07-09 at 16:03 +0800, Tiejun Chen wrote:
> book3e is different with book3s since 3s includes the exception
> vectors code in head_64.S as it relies on absolute addressing
> which is only possible within this compilation unit. So we have
> to get that label address with got.
> 
> And when boot a relocated kernel, we should reset ipvr properly again
> after .relocate.

ivpr?

> 
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
> ---
>  arch/powerpc/include/asm/exception-64e.h |   11 +++++++++++
>  arch/powerpc/kernel/exceptions-64e.S     |   18 +++++++++++++++++-
>  arch/powerpc/kernel/head_64.S            |   25 +++++++++++++++++++++++++
>  3 files changed, 53 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64e.h b/arch/powerpc/include/asm/exception-64e.h
> index 51fa43e..371a77f 100644
> --- a/arch/powerpc/include/asm/exception-64e.h
> +++ b/arch/powerpc/include/asm/exception-64e.h
> @@ -214,10 +214,21 @@ exc_##label##_book3e:
>  #define TLB_MISS_STATS_SAVE_INFO_BOLTED
>  #endif
>  
> +#ifndef CONFIG_RELOCATABLE

Please use positive logic (ifdef/else, rather than ifndef/else).

>  #define SET_IVOR(vector_number, vector_offset)	\
>  	li	r3,vector_offset@l; 		\
>  	ori	r3,r3,interrupt_base_book3e@l;	\
>  	mtspr	SPRN_IVOR##vector_number,r3;
> +#else /* !CONFIG_RELOCATABLE */
> +/* In relocatable case the value of the constant expression 'expr' is only
> + * offset. So instead, we should loads the address of label 'name'.
> + */
> +#define SET_IVOR(vector_number, vector_offset)	\
> +	LOAD_REG_ADDR(r3,interrupt_base_book3e);\
> +	rlwinm	r3,r3,0,15,0;			\
> +	ori	r3,r3,vector_offset@l;		\
> +	mtspr	SPRN_IVOR##vector_number,r3;
> +#endif /* CONFIG_RELOCATABLE */

Please use the more readable 4-operand version of "rlwinm".

Is there a reason why this new code is only used with
CONFIG_RELOCATABLE?  If @got doesn't work without CONFIG_RELOCATABLE,
then the ifdef should be pushed into LOAD_REG_ADDR.

Likewise with other ifdefs on CONFIG_RELOCATABLE.

> -_STATIC(init_core_book3e)
> +_GLOBAL(init_core_book3e)
>  	/* Establish the interrupt vector base */
> +#ifdef CONFIG_RELOCATABLE
> +/* In relocatable case the value of the constant expression 'expr' is only
> + * offset. So instead, we should loads the address of label 'name'.
> + */
> +	tovirt(r2,r2)
> +	LOAD_REG_ADDR(r3, interrupt_base_book3e)
> +#else
>  	LOAD_REG_IMMEDIATE(r3, interrupt_base_book3e)
> +#endif

I'm having a hard time parsing the comment.  Plus, it feels wrong to
decouple tovirt(r2,r2) from the call to relative_toc.

>  	mtspr	SPRN_IVPR,r3
>  	sync
>  	blr
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index b61363d..550f8fb 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -414,12 +414,25 @@ _STATIC(__after_prom_start)
>  	/* process relocations for the final address of the kernel */
>  	lis	r25,PAGE_OFFSET@highest	/* compute virtual base of kernel */
>  	sldi	r25,r25,32
> +#if defined(CONFIG_PPC_BOOK3E)
> +	tovirt(r26,r26)			/* on booke, we already run at PAGE_OFFSET */
> +#endif
>  	lwz	r7,__run_at_load-_stext(r26)
> +#if defined(CONFIG_PPC_BOOK3E)
> +	tophys(r26,r26)			/* Restore for the remains. */
> +#endif
>  	cmplwi	cr0,r7,1	/* flagged to stay where we are ? */
>  	bne	1f
>  	add	r25,r25,r26
>  1:	mr	r3,r25
>  	bl	.relocate
> +#if defined(CONFIG_PPC_BOOK3E)
> +	/* In relocatable case we always have to load the address of label 'name'
> +	 * to set IVPR. So after .relocate we have to update IVPR with current
> +	 * address of label.
> +	 */
> +	bl	.init_core_book3e
> +#endif

Maybe this function should be renamed to something ivpr-specific, so
nothing else gets added there.

>  #endif
>  
>  /*
> @@ -447,12 +460,24 @@ _STATIC(__after_prom_start)
>   * variable __run_at_load, if it is set the kernel is treated as relocatable
>   * kernel, otherwise it will be moved to PHYSICAL_START
>   */
> +#if defined(CONFIG_PPC_BOOK3E)
> +	tovirt(r26,r26)			/* on booke, we already run at PAGE_OFFSET */
> +#endif
>  	lwz	r7,__run_at_load-_stext(r26)
> +#if defined(CONFIG_PPC_BOOK3E)
> +	tophys(r26,r26)			/* Restore for the remains. */
> +#endif
>  	cmplwi	cr0,r7,1
>  	bne	3f
>  
> +#ifdef CONFIG_PPC_BOOK3E
> +	LOAD_REG_ADDR(r5, __end_interrupts)
> +	LOAD_REG_ADDR(r11, _stext)
> +	sub	r5,r5,r11
> +#else
>  	/* just copy interrupts */
>  	LOAD_REG_IMMEDIATE(r5, __end_interrupts - _stext)
> +#endif

Can't we skip the interrupt copying on book3e?  And if not for some
reason, why start at _stext?

-Scott

^ permalink raw reply

* Re: [v3][PATCH 3/8] book3e/kexec/kdump: enable kexec for kernel
From: Scott Wood @ 2013-12-18  3:35 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1373357007-30785-4-git-send-email-tiejun.chen@windriver.com>

On Tue, 2013-07-09 at 16:03 +0800, Tiejun Chen wrote:
> We need to active KEXEC for book3e and bypass or convert non-book3e stuff
> in kexec coverage.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
> ---
>  arch/powerpc/Kconfig                   |    2 +-
>  arch/powerpc/kernel/machine_kexec_64.c |  148 ++++++++++++++++++--------------
>  arch/powerpc/kernel/misc_64.S          |    6 ++
>  3 files changed, 89 insertions(+), 67 deletions(-)
> 
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 5374776..d945435 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -357,7 +357,7 @@ config ARCH_ENABLE_MEMORY_HOTREMOVE
>  
>  config KEXEC
>  	bool "kexec system call"
> -	depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP))
> +	depends on (PPC_BOOK3S || FSL_BOOKE || (44x && !SMP)) || PPC_BOOK3E

Please remove the outher parentheses, and especially don't put
PPC_BOOK3E on the outside of them when there's no reason to group the
other items together.

> @@ -367,6 +301,87 @@ void default_machine_kexec(struct kimage *image)
>  	/* NOTREACHED */
>  }
>  
> +#ifdef CONFIG_PPC_BOOK3E
> +int default_machine_kexec_prepare(struct kimage *image)
> +{
> +	int i;
> +	/*
> +	 * Since we use the kernel fault handlers and paging code to
> +	 * handle the virtual mode, we must make sure no destination
> +	 * overlaps kernel static data or bss.
> +	 */
> +	for (i = 0; i < image->nr_segments; i++)
> +		if (image->segment[i].mem < __pa(_end))
> +			return -ETXTBSY;
> +	return 0;

Factor out this common code rather than duplicate it.

-Scott

^ permalink raw reply

* Re: [v3][PATCH 4/8] book3e/kexec/kdump: create a 1:1 TLB mapping
From: Scott Wood @ 2013-12-18  3:39 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1373357007-30785-5-git-send-email-tiejun.chen@windriver.com>

On Tue, 2013-07-09 at 16:03 +0800, Tiejun Chen wrote:
> book3e have no real MMU mode so we have to create a 1:1 TLB
> mapping to make sure we can access the real physical address.
> And correct something to support this pseudo real mode on book3e.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>

Why do we need to be able to directly access physical addresses?

> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> index f1a7ce7..20cbb98 100644
> --- a/arch/powerpc/kernel/misc_64.S
> +++ b/arch/powerpc/kernel/misc_64.S
> @@ -460,6 +460,49 @@ kexec_flag:
>  
> 
>  #ifdef CONFIG_KEXEC
> +#ifdef CONFIG_PPC_BOOK3E
> +/* BOOK3E have no a real MMU mode so we have to setup the initial TLB
> + * for a core to map v:0 to p:0 as 1:1. This current implementation
> + * assume that 1G is enough for kexec.
> + */
> +#include <asm/mmu.h>

#includes go at the top of the file.

> +kexec_create_tlb:
> +	/* Invalidate all TLBs to avoid any TLB conflict. */
> +	PPC_TLBILX_ALL(0,R0)
> +	sync
> +	isync
> +
> +	mfspr	r10,SPRN_TLB1CFG
> +	andi.	r10,r10,TLBnCFG_N_ENTRY	/* Extract # entries */
> +	subi	r10,r10,1		/* Often its always safe to use last */
> +	lis	r9,MAS0_TLBSEL(1)@h
> +	rlwimi	r9,r10,16,4,15		/* Setup MAS0 = TLBSEL | ESEL(r9) */

Hardcoding TLB1 makes this FSL-specific code, but you've put it in a
non-FSL-specific place.

> +/* Setup a temp mapping v:0 to p:0 as 1:1 and return to it.
> + */
> +#ifdef CONFIG_SMP
> +#define M_IF_SMP	MAS2_M
> +#else
> +#define M_IF_SMP	0
> +#endif
> +	mtspr	SPRN_MAS0,r9
> +
> +	lis	r9,(MAS1_VALID|MAS1_IPROT)@h
> +	ori	r9,r9,(MAS1_TSIZE(BOOK3E_PAGESZ_1GB))@l
> +	mtspr	SPRN_MAS1,r9

What if the machine has less than 1 GiB of RAM?  We could get
speculative accesses to non-present addresses.

Though it looks like the normal 64-bit init sequence has the same
problem...

-Scott

^ permalink raw reply

* Re: [v3][PATCH 5/8] book3e/kexec/kdump: introduce a kexec kernel flag
From: Scott Wood @ 2013-12-18  3:42 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1373357007-30785-6-git-send-email-tiejun.chen@windriver.com>

On Tue, 2013-07-09 at 16:03 +0800, Tiejun Chen wrote:
> We need to introduce a flag to indicate we're already running
> a kexec kernel then we can go proper path. For example, We
> shouldn't access spin_table from the bootloader to up any secondary
> cpu for kexec kernel, and kexec kernel already know how to jump to
> generic_secondary_smp_init.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
> ---
>  arch/powerpc/include/asm/smp.h    |    1 +
>  arch/powerpc/kernel/head_64.S     |   10 ++++++++++
>  arch/powerpc/kernel/misc_64.S     |    6 ++++++
>  arch/powerpc/platforms/85xx/smp.c |   20 +++++++++++++++-----
>  4 files changed, 32 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/smp.h b/arch/powerpc/include/asm/smp.h
> index ffbaabe..59165a3 100644
> --- a/arch/powerpc/include/asm/smp.h
> +++ b/arch/powerpc/include/asm/smp.h
> @@ -200,6 +200,7 @@ extern void generic_secondary_thread_init(void);
>  extern unsigned long __secondary_hold_spinloop;
>  extern unsigned long __secondary_hold_acknowledge;
>  extern char __secondary_hold;
> +extern unsigned long __run_at_kexec;
>  
>  extern void __early_start(void);
>  #endif /* __ASSEMBLY__ */
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index 7dc56be..0b46c9d 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -89,6 +89,10 @@ __secondary_hold_spinloop:
>  __secondary_hold_acknowledge:
>  	.llong	0x0
>  
> +	.globl	__run_at_kexec
> +__run_at_kexec:
> +	.llong	0x0	/* Flag for the secondary kernel from kexec. */
> +

No leading underscores please -- and why does this need to be 64-bit?

>  #ifdef CONFIG_RELOCATABLE
>  	/* This flag is set to 1 by a loader if the kernel should run
>  	 * at the loaded address instead of the linked address.  This
> @@ -417,6 +421,12 @@ _STATIC(__after_prom_start)
>  #if defined(CONFIG_PPC_BOOK3E)
>  	tovirt(r26,r26)			/* on booke, we already run at PAGE_OFFSET */
>  #endif
> +#if defined(CONFIG_KEXEC) || defined(CONFIG_CRASH_DUMP)
> +	/* If relocated we need to restore this flag on that relocated address. */
> +	ld	r7,__run_at_kexec-_stext(r26)
> +	std	r7,__run_at_kexec-_stext(r26)
> +#endif
> +
>  	lwz	r7,__run_at_load-_stext(r26)
>  #if defined(CONFIG_PPC_BOOK3E)
>  	tophys(r26,r26)			/* Restore for the remains. */
> diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S
> index 20cbb98..c89aead 100644
> --- a/arch/powerpc/kernel/misc_64.S
> +++ b/arch/powerpc/kernel/misc_64.S
> @@ -619,6 +619,12 @@ _GLOBAL(kexec_sequence)
>  	bl	.copy_and_flush	/* (dest, src, copy limit, start offset) */
>  1:	/* assume normal blr return */
>  
> +	/* notify we're going into kexec kernel for SMP. */
> +	LOAD_REG_ADDR(r3,__run_at_kexec)
> +	li	r4,1
> +	std	r4,0(r3)
> +	sync
> +
>  	/* release other cpus to the new kernel secondary start at 0x60 */
>  	mflr	r5
>  	li	r6,1
> diff --git a/arch/powerpc/platforms/85xx/smp.c b/arch/powerpc/platforms/85xx/smp.c
> index 5ced4f5..14d461b 100644
> --- a/arch/powerpc/platforms/85xx/smp.c
> +++ b/arch/powerpc/platforms/85xx/smp.c
> @@ -150,6 +150,9 @@ static int smp_85xx_kick_cpu(int nr)
>  	int hw_cpu = get_hard_smp_processor_id(nr);
>  	int ioremappable;
>  	int ret = 0;
> +#ifdef CONFIG_PPC64
> +	unsigned long *ptr = NULL;
> +#endif

Looks like an unnecessary initialization.

>  
>  	WARN_ON(nr < 0 || nr >= NR_CPUS);
>  	WARN_ON(hw_cpu < 0 || hw_cpu >= NR_CPUS);
> @@ -238,11 +241,18 @@ out:
>  #else
>  	smp_generic_kick_cpu(nr);
>  
> -	flush_spin_table(spin_table);
> -	out_be32(&spin_table->pir, hw_cpu);
> -	out_be64((u64 *)(&spin_table->addr_h),
> -	  __pa((u64)*((unsigned long long *)generic_secondary_smp_init)));
> -	flush_spin_table(spin_table);
> +	ptr  = (unsigned long *)((unsigned long)&__run_at_kexec);
> +	/* We shouldn't access spin_table from the bootloader to up any
> +	 * secondary cpu for kexec kernel, and kexec kernel already
> +	 * know how to jump to generic_secondary_smp_init.
> +	 */
> +	if (!*ptr) {
> +		flush_spin_table(spin_table);
> +		out_be32(&spin_table->pir, hw_cpu);
> +		out_be64((u64 *)(&spin_table->addr_h),
> +		 __pa((u64)*((unsigned long long *)generic_secondary_smp_init)));
> +		flush_spin_table(spin_table);
> +	}
>  #endif

Please use a more descriptive name than "ptr".

How is all that different than just:

	if (!__run_at_kexec) {
		...
	}

-Scott

^ permalink raw reply

* Re: [PATCH] powerpc, perf: Configure BHRB filter before enabling PMU interrupts
From: Anshuman Khandual @ 2013-12-18  3:41 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linuxppc-dev
In-Reply-To: <1387332893-14407-1-git-send-email-mpe@ellerman.id.au>

On 12/18/2013 07:44 AM, Michael Ellerman wrote:
> From: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> 
> Right now the config_bhrb() PMU specific call happens after
> write_mmcr0(), which actually enables the PMU for event counting and
> interrupts. So there is a small window of time where the PMU and BHRB
> runs without the required HW branch filter (if any) enabled in BHRB.
> 
> This can cause some of the branch samples to be collected through BHRB
> without any filter applied and hence affects the correctness of
> the results. This patch moves the BHRB config function call before
> enabling interrupts.
> 
> Here are some data points captured via trace prints which depicts how we
> could get PMU interrupts with BHRB filter NOT enabled with a standard
> perf record command line (asking for branch record information as well).
> 
>     $ perf record -j any_call ls
> 
> Before the patch:-
> 
>     ls-1962  [003] d...  2065.299590: .perf_event_interrupt: MMCRA: 40000000000
>     ls-1962  [003] d...  2065.299603: .perf_event_interrupt: MMCRA: 40000000000
>     ...
> 
>     All the PMU interrupts before this point did not have the requested
>     HW branch filter enabled in the MMCRA.
> 
>     ls-1962  [003] d...  2065.299647: .perf_event_interrupt: MMCRA: 40040000000
>     ls-1962  [003] d...  2065.299662: .perf_event_interrupt: MMCRA: 40040000000
> 
> After the patch:-
> 
>     ls-1850  [008] d...   190.311828: .perf_event_interrupt: MMCRA: 40040000000
>     ls-1850  [008] d...   190.311848: .perf_event_interrupt: MMCRA: 40040000000
> 
>     All the PMU interrupts have the requested HW BHRB branch filter
>     enabled in MMCRA.
> 
> Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
> [mpe: Fixed up whitespace and cleaned up changelog]

Thanks Michael for cleaning and resending this out.

^ permalink raw reply

* Re: [v3][PATCH 6/8] book3e/kexec/kdump: implement ppc64 kexec specfic
From: Scott Wood @ 2013-12-18  3:45 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1373357007-30785-7-git-send-email-tiejun.chen@windriver.com>

On Tue, 2013-07-09 at 16:03 +0800, Tiejun Chen wrote:
> ppc64 kexec mechanism has a different implementation with ppc32.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>

Could you describe the relevant differences?

-Scott

^ permalink raw reply

* Re: [v3][PATCH 7/8] book3e/kexec/kdump: redefine VIRT_PHYS_OFFSET
From: Scott Wood @ 2013-12-18  3:48 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1373357007-30785-8-git-send-email-tiejun.chen@windriver.com>

On Tue, 2013-07-09 at 16:03 +0800, Tiejun Chen wrote:
> Book3e is always aligned 1GB to create TLB so we should
> use (KERNELBASE - MEMORY_START) as VIRT_PHYS_OFFSET to
> get __pa/__va properly while boot kdump.

What if MEMORY_START - PHYSICAL_START >= 1 GiB?

What about the comment that says we can't use MEMORY_START before
parsing the device tree?

-Scott

^ permalink raw reply

* Re: [v3][PATCH 8/8] book3e/kexec/kdump: recover "r4 = 0" to create the initial TLB
From: Scott Wood @ 2013-12-18  3:50 UTC (permalink / raw)
  To: Tiejun Chen; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1373357007-30785-9-git-send-email-tiejun.chen@windriver.com>

On Tue, 2013-07-09 at 16:03 +0800, Tiejun Chen wrote:
> In commit 96f013f, "powerpc/kexec: Add kexec "hold" support for Book3e
> processors", requires that GPR4 survive the "hold" process, for IBM Blue
> Gene/Q with with some very strange firmware. But for FSL Book3E, r4 = 1
> to indicate that the initial TLB entry for this core already exists so
> we still should set r4 with 0 to create that initial TLB.
> 
> Signed-off-by: Tiejun Chen <tiejun.chen@windriver.com>
> ---
>  arch/powerpc/kernel/head_64.S |    4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/head_64.S b/arch/powerpc/kernel/head_64.S
> index 0b46c9d..d546c5e 100644
> --- a/arch/powerpc/kernel/head_64.S
> +++ b/arch/powerpc/kernel/head_64.S
> @@ -127,6 +127,10 @@ __secondary_hold:
>  	/* Grab our physical cpu number */
>  	mr	r24,r3
>  	/* stash r4 for book3e */
> +#ifdef CONFIG_PPC_FSL_BOOK3E
> +	/* we need to setup initial TLB entry. */
> +	li	r4,0
> +#endif
>  	mr	r25,r4

This breaks being able to build one kernel that supports both FSL book3e
and IBM book3e.

-Scott

^ permalink raw reply

* Re: [PATCH V4 09/10] power8, perf: Change BHRB branch filter configuration
From: Anshuman Khandual @ 2013-12-18  3:55 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: mikey, ak, linux-kernel, eranian, linuxppc-dev, acme, sukadev,
	mingo
In-Reply-To: <1387325285.19507.1.camel@concordia>

On 12/18/2013 05:38 AM, Michael Ellerman wrote:
> On Fri, 2013-12-13 at 13:50 +0530, Anshuman Khandual wrote:
>> On 12/09/2013 11:51 AM, Michael Ellerman wrote:
>>>
>>> As I said in my comments on version 3 which you ignored:
>>>
>>>     I think it would be clearer if we actually checked for the possibilities we
>>>     allow and let everything else fall through, eg:
>>>
>>> Â Â Â Â Â Â Â Â /* Ignore user/kernel/hv bits */
>>> Â Â Â Â Â Â Â Â branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
>>>
>>> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return 0;
>>>
>>> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL)
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM1;
>>> Â 
>>> Â Â Â Â Â Â Â Â if (branch_sample_type == PERF_SAMPLE_BRANCH_COND)
>>> Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â return POWER8_MMCRA_IFM3;
>>> Â Â Â Â Â Â Â Â 
>>> Â Â Â Â Â Â Â Â return -1;
>>>
>>
>> Hey Michael,
>>
>> This patch only adds support for the PERF_SAMPLE_BRANCH_COND filter, if the
>> over all code flow does not clearly suggest that all combinations of any of
>> these HW filters are invalid, then we can go with one more patch to clean
>> that up before or after this patch but not here in this patch. Finally the
>> code section here will look something like this. Does it sound good ?
> 
> Better, but not quite.
> 
>> static u64 power8_bhrb_filter_map(u64 branch_sample_type)
>> {
>>         u64 pmu_bhrb_filter = 0;
>>
>>         /* BHRB and regular PMU events share the same privilege state
>>          * filter configuration. BHRB is always recorded along with a
>>          * regular PMU event. As the privilege state filter is handled
>>          * in the basic PMC configuration of the accompanying regular
>>          * PMU event, we ignore any separate BHRB specific request.
>>          */
>>
>>         /* Ignore user, kernel, hv bits */
>>         branch_sample_type &= ~PERF_SAMPLE_BRANCH_PLM_ALL;
>>
>>         if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY)
>>                 return pmu_bhrb_filter;
> 
> return 0;
> 
>>
>>
>>         if (branch_sample_type == PERF_SAMPLE_BRANCH_ANY_CALL) {
>>                 pmu_bhrb_filter |= POWER8_MMCRA_IFM1;
>>                 return pmu_bhrb_filter;
> 
> return POWER8_MMCRA_IFM1;
> 
>>         }
>>
>>         if (branch_sample_type == PERF_SAMPLE_BRANCH_COND) {
>>                 pmu_bhrb_filter |= POWER8_MMCRA_IFM3;
>>                 return pmu_bhrb_filter;
> 
> return POWER8_MMCRA_IFM3;
> 
>>         }
>>
>>         /* Every thing else is unsupported */
>>         return -1;
>> }
> 

Okay, will take these changes into another patch before adding conditional branch
filter here.

^ permalink raw reply

* [PATCH v12] PPC: POWERNV: move iommu_add_device earlier
From: Alexey Kardashevskiy @ 2013-12-18  4:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Alexey Kardashevskiy, linux-kernel, Bharat Bhushan, scottwood,
	Varun Sethi, linuxppc-dev

The current implementation of IOMMU on sPAPR does not use iommu_ops
and therefore does not call IOMMU API's bus_set_iommu() which
1) sets iommu_ops for a bus
2) registers a bus notifier
Instead, PCI devices are added to IOMMU groups from
subsys_initcall_sync(tce_iommu_init) which does basically the same
thing without using iommu_ops callbacks.

However Freescale PAMU driver (https://lkml.org/lkml/2013/7/1/158)
implements iommu_ops and when tce_iommu_init is called, every PCI device
is already added to some group so there is a conflict.

This patch does 2 things:
1. removes the loop in which PCI devices were added to groups and
adds explicit iommu_add_device() calls to add devices as soon as they get
the iommu_table pointer assigned to them.
2. moves a bus notifier to powernv code in order to avoid conflict with
the notifier from Freescale driver.

iommu_add_device() and iommu_del_device() are public now.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
Changes:
v12:
* removed redundant bus notifier from common POWERPC code

v11:
* rebased on upstream

v10:
* fixed linker error when IOMMU_API is not enabled

v9:
* removed "KVM" from the subject as it is not really a KVM patch so
PPC mainainter (hi Ben!) can review/include it into his tree

v8:
* added the check for iommu_group!=NULL before removing device from a group
as suggested by Wei Yang <weiyang@linux.vnet.ibm.com>

v2:
* added a helper - set_iommu_table_base_and_group - which does
set_iommu_table_base() and iommu_add_device()
---
 arch/powerpc/include/asm/iommu.h            | 26 ++++++++++++++++++
 arch/powerpc/kernel/iommu.c                 | 41 +++--------------------------
 arch/powerpc/platforms/powernv/pci-ioda.c   |  8 +++---
 arch/powerpc/platforms/powernv/pci-p5ioc2.c |  2 +-
 arch/powerpc/platforms/powernv/pci.c        | 31 +++++++++++++++++++++-
 arch/powerpc/platforms/pseries/iommu.c      |  8 +++---
 6 files changed, 70 insertions(+), 46 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index c34656a..774fa27 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -101,8 +101,34 @@ extern void iommu_free_table(struct iommu_table *tbl, const char *node_name);
  */
 extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
 					    int nid);
+#ifdef CONFIG_IOMMU_API
 extern void iommu_register_group(struct iommu_table *tbl,
 				 int pci_domain_number, unsigned long pe_num);
+extern int iommu_add_device(struct device *dev);
+extern void iommu_del_device(struct device *dev);
+#else
+static inline void iommu_register_group(struct iommu_table *tbl,
+					int pci_domain_number,
+					unsigned long pe_num)
+{
+}
+
+static inline int iommu_add_device(struct device *dev)
+{
+	return 0;
+}
+
+static inline void iommu_del_device(struct device *dev)
+{
+}
+#endif /* !CONFIG_IOMMU_API */
+
+static inline void set_iommu_table_base_and_group(struct device *dev,
+						  void *base)
+{
+	set_iommu_table_base(dev, base);
+	iommu_add_device(dev);
+}
 
 extern int iommu_map_sg(struct device *dev, struct iommu_table *tbl,
 			struct scatterlist *sglist, int nelems,
diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 572bb5b..ecbf468 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -1105,7 +1105,7 @@ void iommu_release_ownership(struct iommu_table *tbl)
 }
 EXPORT_SYMBOL_GPL(iommu_release_ownership);
 
-static int iommu_add_device(struct device *dev)
+int iommu_add_device(struct device *dev)
 {
 	struct iommu_table *tbl;
 	int ret = 0;
@@ -1134,46 +1134,13 @@ static int iommu_add_device(struct device *dev)
 
 	return ret;
 }
+EXPORT_SYMBOL_GPL(iommu_add_device);
 
-static void iommu_del_device(struct device *dev)
+void iommu_del_device(struct device *dev)
 {
 	iommu_group_remove_device(dev);
 }
-
-static int iommu_bus_notifier(struct notifier_block *nb,
-			      unsigned long action, void *data)
-{
-	struct device *dev = data;
-
-	switch (action) {
-	case BUS_NOTIFY_ADD_DEVICE:
-		return iommu_add_device(dev);
-	case BUS_NOTIFY_DEL_DEVICE:
-		iommu_del_device(dev);
-		return 0;
-	default:
-		return 0;
-	}
-}
-
-static struct notifier_block tce_iommu_bus_nb = {
-	.notifier_call = iommu_bus_notifier,
-};
-
-static int __init tce_iommu_init(void)
-{
-	struct pci_dev *pdev = NULL;
-
-	BUILD_BUG_ON(PAGE_SIZE < IOMMU_PAGE_SIZE);
-
-	for_each_pci_dev(pdev)
-		iommu_add_device(&pdev->dev);
-
-	bus_register_notifier(&pci_bus_type, &tce_iommu_bus_nb);
-	return 0;
-}
-
-subsys_initcall_sync(tce_iommu_init);
+EXPORT_SYMBOL_GPL(iommu_del_device);
 
 #else
 
diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
index 2c6d173..f0e6871 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -460,7 +460,7 @@ static void pnv_pci_ioda_dma_dev_setup(struct pnv_phb *phb, struct pci_dev *pdev
 		return;
 
 	pe = &phb->ioda.pe_array[pdn->pe_number];
-	set_iommu_table_base(&pdev->dev, &pe->tce32_table);
+	set_iommu_table_base_and_group(&pdev->dev, &pe->tce32_table);
 }
 
 static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
@@ -468,7 +468,7 @@ static void pnv_ioda_setup_bus_dma(struct pnv_ioda_pe *pe, struct pci_bus *bus)
 	struct pci_dev *dev;
 
 	list_for_each_entry(dev, &bus->devices, bus_list) {
-		set_iommu_table_base(&dev->dev, &pe->tce32_table);
+		set_iommu_table_base_and_group(&dev->dev, &pe->tce32_table);
 		if (dev->subordinate)
 			pnv_ioda_setup_bus_dma(pe, dev->subordinate);
 	}
@@ -644,7 +644,7 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb *phb,
 	iommu_register_group(tbl, pci_domain_nr(pe->pbus), pe->pe_number);
 
 	if (pe->pdev)
-		set_iommu_table_base(&pe->pdev->dev, tbl);
+		set_iommu_table_base_and_group(&pe->pdev->dev, tbl);
 	else
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
 
@@ -723,7 +723,7 @@ static void pnv_pci_ioda2_setup_dma_pe(struct pnv_phb *phb,
 	iommu_register_group(tbl, pci_domain_nr(pe->pbus), pe->pe_number);
 
 	if (pe->pdev)
-		set_iommu_table_base(&pe->pdev->dev, tbl);
+		set_iommu_table_base_and_group(&pe->pdev->dev, tbl);
 	else
 		pnv_ioda_setup_bus_dma(pe, pe->pbus);
 
diff --git a/arch/powerpc/platforms/powernv/pci-p5ioc2.c b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
index f8b4bd8..e3807d6 100644
--- a/arch/powerpc/platforms/powernv/pci-p5ioc2.c
+++ b/arch/powerpc/platforms/powernv/pci-p5ioc2.c
@@ -92,7 +92,7 @@ static void pnv_pci_p5ioc2_dma_dev_setup(struct pnv_phb *phb,
 				pci_domain_nr(phb->hose->bus), phb->opal_id);
 	}
 
-	set_iommu_table_base(&pdev->dev, &phb->p5ioc2.iommu_table);
+	set_iommu_table_base_and_group(&pdev->dev, &phb->p5ioc2.iommu_table);
 }
 
 static void __init pnv_pci_init_p5ioc2_phb(struct device_node *np, u64 hub_id,
diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c
index 4eb33a9..362b8ac 100644
--- a/arch/powerpc/platforms/powernv/pci.c
+++ b/arch/powerpc/platforms/powernv/pci.c
@@ -536,7 +536,7 @@ static void pnv_pci_dma_fallback_setup(struct pci_controller *hose,
 		pdn->iommu_table = pnv_pci_setup_bml_iommu(hose);
 	if (!pdn->iommu_table)
 		return;
-	set_iommu_table_base(&pdev->dev, pdn->iommu_table);
+	set_iommu_table_base_and_group(&pdev->dev, pdn->iommu_table);
 }
 
 static void pnv_pci_dma_dev_setup(struct pci_dev *pdev)
@@ -657,3 +657,32 @@ void __init pnv_pci_init(void)
 	ppc_md.teardown_msi_irqs = pnv_teardown_msi_irqs;
 #endif
 }
+
+static int pnv_iommu_bus_notifier(struct notifier_block *nb,
+		unsigned long action, void *data)
+{
+	struct device *dev = data;
+
+	switch (action) {
+	case BUS_NOTIFY_ADD_DEVICE:
+		return iommu_add_device(dev);
+	case BUS_NOTIFY_DEL_DEVICE:
+		if (dev->iommu_group)
+			iommu_del_device(dev);
+		return 0;
+	default:
+		return 0;
+	}
+}
+
+static struct notifier_block pnv_iommu_bus_nb = {
+	.notifier_call = pnv_iommu_bus_notifier,
+};
+
+static int __init pnv_iommu_bus_notifier_init(void)
+{
+	bus_register_notifier(&pci_bus_type, &pnv_iommu_bus_nb);
+	return 0;
+}
+
+subsys_initcall_sync(pnv_iommu_bus_notifier_init);
diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
index f253361..a80af6c 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -687,7 +687,8 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 		iommu_table_setparms(phb, dn, tbl);
 		PCI_DN(dn)->iommu_table = iommu_init_table(tbl, phb->node);
 		iommu_register_group(tbl, pci_domain_nr(phb->bus), 0);
-		set_iommu_table_base(&dev->dev, PCI_DN(dn)->iommu_table);
+		set_iommu_table_base_and_group(&dev->dev,
+					       PCI_DN(dn)->iommu_table);
 		return;
 	}
 
@@ -699,7 +700,8 @@ static void pci_dma_dev_setup_pSeries(struct pci_dev *dev)
 		dn = dn->parent;
 
 	if (dn && PCI_DN(dn))
-		set_iommu_table_base(&dev->dev, PCI_DN(dn)->iommu_table);
+		set_iommu_table_base_and_group(&dev->dev,
+					       PCI_DN(dn)->iommu_table);
 	else
 		printk(KERN_WARNING "iommu: Device %s has no iommu table\n",
 		       pci_name(dev));
@@ -1193,7 +1195,7 @@ static void pci_dma_dev_setup_pSeriesLP(struct pci_dev *dev)
 		pr_debug("  found DMA window, table: %p\n", pci->iommu_table);
 	}
 
-	set_iommu_table_base(&dev->dev, pci->iommu_table);
+	set_iommu_table_base_and_group(&dev->dev, pci->iommu_table);
 }
 
 static int dma_set_mask_pSeriesLP(struct device *dev, u64 dma_mask)
-- 
1.8.4.rc4

^ permalink raw reply related

* Re: [PATCH 8/8] powerpc: Fix endian issues in crash dump code
From: Michael Ellerman @ 2013-12-18  4:45 UTC (permalink / raw)
  To: Anton Blanchard; +Cc: paulus, linuxppc-dev, Ulrich.Weigand
In-Reply-To: <1386824381-14032-9-git-send-email-anton@samba.org>

On Thu, 2013-12-12 at 15:59 +1100, Anton Blanchard wrote:
> A couple more device tree properties that need byte swapping.
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
>  arch/powerpc/kernel/crash_dump.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
> index 779a78c..11c1d06 100644
> --- a/arch/powerpc/kernel/crash_dump.c
> +++ b/arch/powerpc/kernel/crash_dump.c
> @@ -124,15 +124,15 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>  void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
>  {
>  	unsigned long addr;
> -	const u32 *basep, *sizep;
> +	const __be32 *basep, *sizep;
>  	unsigned int rtas_start = 0, rtas_end = 0;
>  
>  	basep = of_get_property(rtas.dev, "linux,rtas-base", NULL);
>  	sizep = of_get_property(rtas.dev, "rtas-size", NULL);
>  
>  	if (basep && sizep) {
> -		rtas_start = *basep;
> -		rtas_end = *basep + *sizep;
> +		rtas_start = be32_to_cpup(basep);
> +		rtas_end = rtas_start + be32_to_cpup(sizep);
>  	}
>  
>  	for (addr = begin; addr < end; addr += PAGE_SIZE) {

Not my favourite colour :D  What about this instead?

We could also add of_property_read_u32(), with an implied index of zero?

I don't like the rc handling, but couldn't come up with anything I liked
better.

cheers


diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
index 11c1d06..60fd0aa 100644
--- a/arch/powerpc/kernel/crash_dump.c
+++ b/arch/powerpc/kernel/crash_dump.c
@@ -124,15 +124,16 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
 void crash_free_reserved_phys_range(unsigned long begin, unsigned long end)
 {
        unsigned long addr;
-       const __be32 *basep, *sizep;
+       u32 base, size;
        unsigned int rtas_start = 0, rtas_end = 0;
+       int rc;
 
-       basep = of_get_property(rtas.dev, "linux,rtas-base", NULL);
-       sizep = of_get_property(rtas.dev, "rtas-size", NULL);
+       rc  = of_property_read_u32_index(rtas.dev, "linux,rtas-base", 0, &base);
+       rc |= of_property_read_u32_index(rtas.dev, "rtas-size", 0, &size);
 
-       if (basep && sizep) {
-               rtas_start = be32_to_cpup(basep);
-               rtas_end = rtas_start + be32_to_cpup(sizep);
+       if (rc == 0) {
+               rtas_start = base;
+               rtas_end = rtas_start + size;
        }
 
        for (addr = begin; addr < end; addr += PAGE_SIZE) {

^ permalink raw reply related

* Re: [PATCH v2] powerpc/powernv: Framework to log critical errors on powernv.
From: Deepthi Dharwar @ 2013-12-18  5:18 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: PowerPC email list
In-Reply-To: <1387334600.19507.5.camel@concordia>

Hi Micheal,

Thanks for the review.

On 12/18/2013 08:13 AM, Michael Ellerman wrote:
> On Mon, 2013-12-16 at 18:00 +0530, Deepthi Dharwar wrote:
>> This patch provides error logging interfaces to report critical
>> powernv error to FSP.
>> All the required information to dump the error is collected
>> at POWERNV level through error log interfaces
>> and then pushed on to FSP.
>>
>> This also supports synchronous error logging in cases of
>> PANIC, by passing OPAL_ERROR_PANIC as severity in
>> elog_create() call.
> 
> Please make note of the fact that none of this code is currently used but will
> be in a subsequent patch. When can we expect those patches?

This patch only adds the framework to log errors. Coming days this
framework will be used to report all POWERNV errors in a phased manner.
We would ideally be targeting one sub-system at a time and use these
interfaces.

> 
>> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>> index 0f01308..1c5440a 100644
>> --- a/arch/powerpc/include/asm/opal.h
>> +++ b/arch/powerpc/include/asm/opal.h
>> @@ -168,6 +168,7 @@ extern int opal_enter_rtas(struct rtas_args *args,
>>  #define OPAL_GET_MSG				85
>>  #define OPAL_CHECK_ASYNC_COMPLETION		86
>>  #define OPAL_SYNC_HOST_REBOOT			87
>> +#define OPAL_ELOG_SEND				88
>>  
>>  #ifndef __ASSEMBLY__
>>  
>> @@ -260,6 +261,122 @@ enum OpalMessageType {
>>  	OPAL_MSG_TYPE_MAX,
>>  };
>>  
>> +/* Classification of Error/Events reporting type classification
> 
> Standard comment style for block comments is:
> 
> /*
>  * Classification ...
>  */
> 
> That applies to almost all of your comments in here.
> 
> 
>> + * Platform Events/Errors: Report Machine Check Interrupt
> 
> I think these comments would be better inline with the values, eg:
> 
> 	/* Report Machine Check Interrupt */
> 	OPAL_PLATFORM,
> 
> 	/* Report all I/O related events/errors */
> 	OPAL_INPUT_OUTPUT,
> 
> 	etc.
> 
> 
> Again that applies to most of your comments.

Sure, I'll make it inline.

>> + * INPUT_OUTPUT: Report all I/O related events/errors
>> + * RESOURCE_DEALLOC: Hotplug events and errors
>> + * MISC: Miscellanous error
>> + * Field: error_events_type
> 
> What is this "Field:" thing about?

Field is just to add some readability that these options relate to
corresponding elog_create argument field.
Looks like the purpose is not getting solved.

>> + */
>> +enum error_events {
> 
> If you're going to define an enum you should actually use it in the API, I
> can't see anywhere you do?
> 
> If you do want to use an enum it should be "opal_error_events".

Agree.

>> +	OPAL_PLATFORM,
>> +	OPAL_INPUT_OUTPUT,
>> +	OPAL_RESOURCE_DEALLOC,
>> +	OPAL_MISC,
>> +};
>> +/* OPAL Subsystem IDs listed for reporting events/errors
>> + * Field: subsystem_id
>> + */
>> +
>> +#define OPAL_PROCESSOR_SUBSYSTEM        0x10
>> +#define OPAL_MEMORY_SUBSYSTEM           0x20
>> +#define OPAL_IO_SUBSYSTEM               0x30
>> +#define OPAL_IO_DEVICES                 0x40
>> +#define OPAL_CEC_HARDWARE               0x50
>> +#define OPAL_POWER_COOLING              0x60
>> +#define OPAL_MISC_SUBSYSTEM             0x70
>> +#define OPAL_SURVEILLANCE_ERR           0x7A
>> +#define OPAL_PLATFORM_FIRMWARE          0x80
>> +#define OPAL_SOFTWARE                   0x90
>> +#define OPAL_EXTERNAL_ENV               0xA0
>> +
>> +/* During reporting an event/error the following represents
>> + * how serious the logged event/error is. (Severity)
>> + * Field: event_sev
>> + */
>> +#define OPAL_INFO                                   0x00
>> +#define OPAL_RECOVERED_ERR_GENERAL                  0x10
>> +
>> +/* 0x2X series is to denote set of Predictive Error
>> + * 0x20 Generic predictive error
>> + * 0x21 Predictive error, degraded performance
>> + * 0x22 Predictive error, fault may be corrected after reboot
>> + * 0x23 Predictive error, fault may be corrected after reboot,
>> + * degraded performance
>> + * 0x24 Predictive error, loss of redundancy
>> + */
>> +#define OPAL_PREDICTIVE_ERR_GENERAL                         0x20
>> +#define OPAL_PREDICTIVE_ERR_DEGRADED_PERF                   0x21
>> +#define OPAL_PREDICTIVE_ERR_FAULT_RECTIFY_REBOOT            0x22
>> +#define OPAL_PREDICTIVE_ERR_FAULT_RECTIFY_BOOT_DEGRADE_PERF 0x23
>> +#define OPAL_PREDICTIVE_ERR_LOSS_OF_REDUNDANCY              0x24
>> +
>> +/* 0x4X series for Unrecoverable Error
>> + * 0x40 Generic Unrecoverable error
>> + * 0x41 Unrecoverable error bypassed with degraded performance
>> + * 0x44 Unrecoverable error bypassed with loss of redundancy
>> + * 0x45 Unrecoverable error bypassed with loss of redundancy and performance
>> + * 0x48 Unrecoverable error bypassed with loss of function
>> + */
>> +#define OPAL_UNRECOVERABLE_ERR_GENERAL                      0x40
>> +#define OPAL_UNRECOVERABLE_ERR_DEGRADE_PERF                 0x41
>> +#define OPAL_UNRECOVERABLE_ERR_LOSS_REDUNDANCY              0x44
>> +#define OPAL_UNRECOVERABLE_ERR_LOSS_REDUNDANCY_PERF         0x45
>> +#define OPAL_UNRECOVERABLE_ERR_LOSS_OF_FUNCTION             0x48
>> +#define OPAL_ERROR_PANIC                                    0x50
>> +
>> +/* Event Sub-type
>> + * This field provides additional information on the non-error
>> + * event type
>> + * Field: event_subtype
>> + */
>> +#define OPAL_NA                                         0x00
>> +#define OPAL_MISCELLANEOUS_INFO_ONLY                    0x01
>> +#define OPAL_PREV_REPORTED_ERR_RECTIFIED                0x10
>> +#define OPAL_SYS_RESOURCES_DECONFIG_BY_USER             0x20
>> +#define OPAL_SYS_RESOURCE_DECONFIG_PRIOR_ERR            0x21
>> +#define OPAL_RESOURCE_DEALLOC_EVENT_NOTIFY              0x22
>> +#define OPAL_CONCURRENT_MAINTENANCE_EVENT               0x40
>> +#define OPAL_CAPACITY_UPGRADE_EVENT                     0x60
>> +#define OPAL_RESOURCE_SPARING_EVENT                     0x70
>> +#define OPAL_DYNAMIC_RECONFIG_EVENT                     0x80
>> +#define OPAL_NORMAL_SYS_PLATFORM_SHUTDOWN               0xD0
>> +#define OPAL_ABNORMAL_POWER_OFF                         0xE0
>> +
>> +/* Max user dump size is 14K    */
>> +#define OPAL_LOG_MAX_DUMP       14336
>> +
>> +/* Multiple user data sections */
>> +struct opal_usr_data_scn {
> 
> Just spell it out? opal_user_data_section 

Sure.

>> +	uint32_t tag;
>> +	uint16_t size;
>> +	uint16_t component_id;
>> +	char data_dump[4];
>> +};
>> +
>> +/* All the information regarding an error/event to be reported
>> + * needs to populate this structure using pre-defined interfaces
>> + * only
>> + */
>> +struct opal_errorlog {
>> +
>> +	uint16_t component_id;
>> +	uint8_t error_events_type:3;
> 
> Bit field?
> 
>> +	uint8_t subsystem_id;
>> +
>> +	uint8_t event_sev;
>> +	uint8_t event_subtype;
>> +	uint8_t usr_scn_count; /* User section count */
> 
> user_section_count;
> 
>> +	uint8_t elog_origin;
>> +
>> +	uint32_t usr_scn_size; /* User section size */
> 
> user_section_size;
> 
>> +	uint32_t reason_code;
>> +	uint32_t additional_info[4];
>> +
>> +	char usr_data_dump[OPAL_LOG_MAX_DUMP];
>> +};
> 
> It looks like this goes straight to Opal, should we be using __packed ?

Yes, this goes straight into Opal. The structure is defined such that
it is packed by default, this will not require compiler to pack bytes.

> 
>> @@ -853,6 +970,14 @@ int64_t opal_dump_ack(uint32_t dump_id);
>>  int64_t opal_get_msg(uint64_t buffer, size_t size);
>>  int64_t opal_check_completion(uint64_t buffer, size_t size, uint64_t token);
>>  int64_t opal_sync_host_reboot(void);
>> +struct opal_errorlog *elog_create(uint8_t err_evt_type, uint16_t component_id,
>> +		uint8_t subsystem_id, uint8_t event_sev, uint8_t  event_subtype,
>> +		uint32_t reason_code, uint32_t info0, uint32_t info1,
>> +		uint32_t info2, uint32_t info3);
>> +int update_user_dump(struct opal_errorlog *buf, unsigned char *data,
>> +					uint32_t tag, uint16_t size);
>> +void commit_errorlog_to_fsp(struct opal_errorlog *buf);
>> +int opal_commit_log_to_fsp(void *buf);
> 
> Are we using "opal_" as a prefix or not?

Uniformity is better. Shall follow the signature here too.

>> diff --git a/arch/powerpc/platforms/powernv/opal-elog.c b/arch/powerpc/platforms/powernv/opal-elog.c
>> index 58849d0..ade1e58 100644
>> --- a/arch/powerpc/platforms/powernv/opal-elog.c
>> +++ b/arch/powerpc/platforms/powernv/opal-elog.c
>> @@ -15,6 +15,7 @@
>>  #include <linux/sysfs.h>
>>  #include <linux/fs.h>
>>  #include <linux/vmalloc.h>
>> +#include <linux/mm.h>
>>  #include <linux/fcntl.h>
>>  #include <asm/uaccess.h>
>>  #include <asm/opal.h>
>> @@ -22,7 +23,9 @@
>>  /* Maximum size of a single log on FSP is 16KB */
>>  #define OPAL_MAX_ERRLOG_SIZE	16384
>>  
>> -/* maximu number of records powernv can hold */
>> +#define USR_CHAR_ARRAY_FIXED_SIZE      4
> 
> What is this?

struct User data section is mapped to a buffer. As all the structures
are padded, we need to subtract the same to do data manipulation.
Make me need to re-word it or use __packed here.

> 
>> +/* Maximum number of records powernv can hold */
> 
> That's an unrelated typo fix AFAICS, please send it separately.

Sure.

> 
>>  #define MAX_NUM_RECORD	128
>>  
>>  struct opal_err_log {
>> @@ -272,6 +275,61 @@ static int init_err_log_buffer(void)
>>  	return 0;
>>  }
>>  
>> +/* Interface to be used by POWERNV to push the logs to FSP via Sapphire */
>> +struct opal_errorlog *elog_create(uint8_t err_evt_type, uint16_t component_id,
>> +		uint8_t subsystem_id, uint8_t event_sev, uint8_t  event_subtype,
>> +		uint32_t reason_code, uint32_t info0, uint32_t info1,
>> +		uint32_t info2, uint32_t info3)
> 
> 
> A call to this function is going to be just a giant list of integer values, it
> will not be easy to see at a glance which value goes in which field.
> 
> I think you'd be better off with an elog_alloc() routine, and then you just do
> the initialisation explicitly so that it's obvious which value goes where:
> 
> 	elog->error_events_type = FOO;
> 	elog->component_id = BAR;
> 	elog->subsystem_id = ETC;
> 

elog_create() will be called by all sub-systems on POWERNV platform to
log events and errors. I feel we are better off passing all the required
arguments to the interface than initialize explicitly.
This would have a cleaner interface to error logging by
1) Removing huge amount of code duplication ( Each and every error/event
to be reported needs to initialise fields of the opal_errorlog struct
done many many times on POWERNV, results in redundant code )
2) There are chances of missing out on initialising key fields if
done by the user. Having an interface mandates the fields that
needs to populated while logging error/events.


> elog_create(uint8_t err_evt_type, uint16_t component_id,
>> +		uint8_t subsystem_id, uint8_t event_sev, uint8_t  event_subtype,
>> +		uint32_t reason_code, uint32_t info0, uint32_t info1,
>> +		uint32_t info2, uint32_t info3)
>> +{
>> +	struct opal_errorlog *buf;
>> +
>> +	buf = kzalloc(sizeof(struct opal_errorlog), GFP_KERNEL);
>> +	if (!buf) {
>> +		printk(KERN_ERR "ELOG: failed to allocate memory.\n");
>> +		return NULL;
>> +	}
>> +
>> +	buf->error_events_type = err_evt_type;
>> +	buf->component_id = component_id;
>> +	buf->subsystem_id = subsystem_id;
>> +	buf->event_sev = event_sev;
>> +	buf->event_subtype = event_subtype;
>> +	buf->reason_code = reason_code;
>> +	buf->additional_info[0] = info0;
>> +	buf->additional_info[1] = info1;
>> +	buf->additional_info[2] = info2;
>> +	buf->additional_info[3] = info3;
>> +	return buf;
>> +}
>> +
>> +int update_user_dump(struct opal_errorlog *buf, unsigned char *data,
>> +				uint32_t tag, uint16_t size)
>> +{
>> +	char *buffer = (char *)buf->usr_data_dump + buf->usr_scn_size;
>> +	struct opal_usr_data_scn *tmp;
>> +
>> +	if ((buf->usr_scn_size + size) > OPAL_LOG_MAX_DUMP) {
>> +		printk(KERN_ERR "ELOG: Size of dump data overruns buffer");
> 
> Use pr_err() and set pr_fmt() to "opal-error-log" at the top of the file.

Sure. I'll do that.

>> +		return -1;
>> +	}
>> +
>> +	tmp = (struct opal_usr_data_scn *)buffer;
>> +	tmp->tag = tag;
>> +	tmp->size = size + sizeof(struct opal_usr_data_scn)
>> +				- USR_CHAR_ARRAY_FIXED_SIZE;
>> +	memcpy(tmp->data_dump, data, size);
>> +
>> +	buf->usr_scn_size += tmp->size;
>> +	buf->usr_scn_count++;
>> +	return 0;
>> +}
>> +
>> +void commit_errorlog_to_fsp(struct opal_errorlog *buf)
>> +{
>> +	opal_commit_log_to_fsp((void *)(vmalloc_to_pfn(buf) << PAGE_SHIFT));
> 
> Can't fail?

It is better to have a return here, atleast for the caller to know if
opal handling of the same is successful or not. I will make the required
change.

>> +	kfree(buf);
> 
> It's a bit rude to free buf when the caller still has a pointer to it.

Technically, after the error log has been committed, the user is not
supposed to re-use or do anything with that buffer. I need to add
checks in all my routines if(buf != NULL), to handle the case where
the user by mistake is trying to use the same buffer pointer.

Regards,
Deepthi

>> +}
> 
> 
> cheers
> 
> 
> 

^ permalink raw reply

* Re: [PATCH v2] powerpc/powernv: Framework to log critical errors on powernv.
From: Michael Ellerman @ 2013-12-18  5:27 UTC (permalink / raw)
  To: Deepthi Dharwar; +Cc: PowerPC email list
In-Reply-To: <52B13019.4010601@linux.vnet.ibm.com>

On Wed, 2013-12-18 at 10:48 +0530, Deepthi Dharwar wrote:
> Hi Micheal,
> 
> Thanks for the review.

No worries.
 
> On 12/18/2013 08:13 AM, Michael Ellerman wrote:
> > On Mon, 2013-12-16 at 18:00 +0530, Deepthi Dharwar wrote:
> >> +/* All the information regarding an error/event to be reported
> >> + * needs to populate this structure using pre-defined interfaces
> >> + * only
> >> + */
> >> +struct opal_errorlog {
> >> +
> >> +	uint16_t component_id;
> >> +	uint8_t error_events_type:3;
> > 
> > Bit field?
> > 
> >> +	uint8_t subsystem_id;
> >> +
> >> +	uint8_t event_sev;
> >> +	uint8_t event_subtype;
> >> +	uint8_t usr_scn_count; /* User section count */
> > 
> > user_section_count;
> > 
> >> +	uint8_t elog_origin;
> >> +
> >> +	uint32_t usr_scn_size; /* User section size */
> > 
> > user_section_size;
> > 
> >> +	uint32_t reason_code;
> >> +	uint32_t additional_info[4];
> >> +
> >> +	char usr_data_dump[OPAL_LOG_MAX_DUMP];
> >> +};
> > 
> > It looks like this goes straight to Opal, should we be using __packed ?
> 
> Yes, this goes straight into Opal. The structure is defined such that
> it is packed by default, this will not require compiler to pack bytes.

Sure, but the compiler might decide to lay it out differently for some reason.
You should use __packed.

Also bitfields are essentially a big "implementation defined behaviour" sign,
so I would avoid using the bitfield.

> >> diff --git a/arch/powerpc/platforms/powernv/opal-elog.c b/arch/powerpc/platforms/powernv/opal-elog.c
> >> index 58849d0..ade1e58 100644
> >> --- a/arch/powerpc/platforms/powernv/opal-elog.c
> >> +++ b/arch/powerpc/platforms/powernv/opal-elog.c
> >> @@ -22,7 +23,9 @@
> >>  /* Maximum size of a single log on FSP is 16KB */
> >>  #define OPAL_MAX_ERRLOG_SIZE	16384
> >>  
> >> -/* maximu number of records powernv can hold */
> >> +#define USR_CHAR_ARRAY_FIXED_SIZE      4
> > 
> > What is this?
> 
> struct User data section is mapped to a buffer. As all the structures
> are padded, we need to subtract the same to do data manipulation.
> Make me need to re-word it or use __packed here.

Yeah that's still not really clear to me, so if you can do something that is
more obvious that would be good.

> >> @@ -272,6 +275,61 @@ static int init_err_log_buffer(void)
> >>  	return 0;
> >>  }
> >>  
> >> +/* Interface to be used by POWERNV to push the logs to FSP via Sapphire */
> >> +struct opal_errorlog *elog_create(uint8_t err_evt_type, uint16_t component_id,
> >> +		uint8_t subsystem_id, uint8_t event_sev, uint8_t  event_subtype,
> >> +		uint32_t reason_code, uint32_t info0, uint32_t info1,
> >> +		uint32_t info2, uint32_t info3)
> > 
> > 
> > A call to this function is going to be just a giant list of integer values, it
> > will not be easy to see at a glance which value goes in which field.
> > 
> > I think you'd be better off with an elog_alloc() routine, and then you just do
> > the initialisation explicitly so that it's obvious which value goes where:
> > 
> > 	elog->error_events_type = FOO;
> > 	elog->component_id = BAR;
> > 	elog->subsystem_id = ETC;
> > 
> 
> elog_create() will be called by all sub-systems on POWERNV platform to
> log events and errors. I feel we are better off passing all the required
> arguments to the interface than initialize explicitly.
> This would have a cleaner interface to error logging by
> 1) Removing huge amount of code duplication ( Each and every error/event
> to be reported needs to initialise fields of the opal_errorlog struct
> done many many times on POWERNV, results in redundant code )

It will be more lines of code, but it might be more readable code.

> 2) There are chances of missing out on initialising key fields if
> done by the user. Having an interface mandates the fields that
> needs to populated while logging error/events.

I can always pass 0 :)

We will see how it looks once there are some callers.

> >> +void commit_errorlog_to_fsp(struct opal_errorlog *buf)
> >> +{
> >> +	opal_commit_log_to_fsp((void *)(vmalloc_to_pfn(buf) << PAGE_SHIFT));
> > 
> > Can't fail?
> 
> It is better to have a return here, atleast for the caller to know if
> opal handling of the same is successful or not. I will make the required
> change.
> 
> >> +	kfree(buf);
> > 
> > It's a bit rude to free buf when the caller still has a pointer to it.
> 
> Technically, after the error log has been committed, the user is not
> supposed to re-use or do anything with that buffer. I need to add
> checks in all my routines if(buf != NULL), to handle the case where
> the user by mistake is trying to use the same buffer pointer.

Why is the user not supposed to re-use it?

kfree()'ing the buffer doesn't prevent the caller from re-using it.

cheers

^ permalink raw reply

* Re: [PATCH v2] powerpc/powernv: Framework to log critical errors on powernv.
From: Deepthi Dharwar @ 2013-12-18  6:21 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: PowerPC email list
In-Reply-To: <1387344458.20735.2.camel@concordia>

On 12/18/2013 10:57 AM, Michael Ellerman wrote:
> On Wed, 2013-12-18 at 10:48 +0530, Deepthi Dharwar wrote:
>> Hi Micheal,
>>
>> Thanks for the review.
> 
> No worries.
> 
>> On 12/18/2013 08:13 AM, Michael Ellerman wrote:
>>> On Mon, 2013-12-16 at 18:00 +0530, Deepthi Dharwar wrote:
>>>> +/* All the information regarding an error/event to be reported
>>>> + * needs to populate this structure using pre-defined interfaces
>>>> + * only
>>>> + */
>>>> +struct opal_errorlog {
>>>> +
>>>> +	uint16_t component_id;
>>>> +	uint8_t error_events_type:3;
>>>
>>> Bit field?
>>>
>>>> +	uint8_t subsystem_id;
>>>> +
>>>> +	uint8_t event_sev;
>>>> +	uint8_t event_subtype;
>>>> +	uint8_t usr_scn_count; /* User section count */
>>>
>>> user_section_count;
>>>
>>>> +	uint8_t elog_origin;
>>>> +
>>>> +	uint32_t usr_scn_size; /* User section size */
>>>
>>> user_section_size;
>>>
>>>> +	uint32_t reason_code;
>>>> +	uint32_t additional_info[4];
>>>> +
>>>> +	char usr_data_dump[OPAL_LOG_MAX_DUMP];
>>>> +};
>>
>>> It looks like this goes straight to Opal, should we be using __packed ?
>>
>> Yes, this goes straight into Opal. The structure is defined such that
>> it is packed by default, this will not require compiler to pack bytes.
> 
> Sure, but the compiler might decide to lay it out differently for some reason.
> You should use __packed.

Ok.

> Also bitfields are essentially a big "implementation defined behaviour" sign,
> so I would avoid using the bitfield.

Yes, bitfields will be gone.

>>>> diff --git a/arch/powerpc/platforms/powernv/opal-elog.c b/arch/powerpc/platforms/powernv/opal-elog.c
>>>> index 58849d0..ade1e58 100644
>>>> --- a/arch/powerpc/platforms/powernv/opal-elog.c
>>>> +++ b/arch/powerpc/platforms/powernv/opal-elog.c
>>>> @@ -22,7 +23,9 @@
>>>>  /* Maximum size of a single log on FSP is 16KB */
>>>>  #define OPAL_MAX_ERRLOG_SIZE	16384
>>>>  
>>>> -/* maximu number of records powernv can hold */
>>>> +#define USR_CHAR_ARRAY_FIXED_SIZE      4
>>>
>>> What is this?
>>
>> struct User data section is mapped to a buffer. As all the structures
>> are padded, we need to subtract the same to do data manipulation.
>> Make me need to re-word it or use __packed here.
> 
> Yeah that's still not really clear to me, so if you can do something that is
> more obvious that would be good.

Sure.

>>>> @@ -272,6 +275,61 @@ static int init_err_log_buffer(void)
>>>>  	return 0;
>>>>  }
>>>>  
>>>> +/* Interface to be used by POWERNV to push the logs to FSP via Sapphire */
>>>> +struct opal_errorlog *elog_create(uint8_t err_evt_type, uint16_t component_id,
>>>> +		uint8_t subsystem_id, uint8_t event_sev, uint8_t  event_subtype,
>>>> +		uint32_t reason_code, uint32_t info0, uint32_t info1,
>>>> +		uint32_t info2, uint32_t info3)
>>>
>>>
>>> A call to this function is going to be just a giant list of integer values, it
>>> will not be easy to see at a glance which value goes in which field.
>>>
>>> I think you'd be better off with an elog_alloc() routine, and then you just do
>>> the initialisation explicitly so that it's obvious which value goes where:
>>>
>>> 	elog->error_events_type = FOO;
>>> 	elog->component_id = BAR;
>>> 	elog->subsystem_id = ETC;
>>>
>>
>> elog_create() will be called by all sub-systems on POWERNV platform to
>> log events and errors. I feel we are better off passing all the required
>> arguments to the interface than initialize explicitly.
>> This would have a cleaner interface to error logging by
>> 1) Removing huge amount of code duplication ( Each and every error/event
>> to be reported needs to initialise fields of the opal_errorlog struct
>> done many many times on POWERNV, results in redundant code )
> 
> It will be more lines of code, but it might be more readable code.
> 
>> 2) There are chances of missing out on initialising key fields if
>> done by the user. Having an interface mandates the fields that
>> needs to populated while logging error/events.
> 
> I can always pass 0 :)

I was referring to more on the lines of missing unintentionally :)

> We will see how it looks once there are some callers.

Sure. I will retain it for now. Going forward once we start adding
users exploiting this interface, we can then take a call.
> 
>>>> +void commit_errorlog_to_fsp(struct opal_errorlog *buf)
>>>> +{
>>>> +	opal_commit_log_to_fsp((void *)(vmalloc_to_pfn(buf) << PAGE_SHIFT));
>>>
>>> Can't fail?
>>
>> It is better to have a return here, atleast for the caller to know if
>> opal handling of the same is successful or not. I will make the required
>> change.
>>
>>>> +	kfree(buf);
>>>
>>> It's a bit rude to free buf when the caller still has a pointer to it.
>>
>> Technically, after the error log has been committed, the user is not
>> supposed to re-use or do anything with that buffer. I need to add
>> checks in all my routines if(buf != NULL), to handle the case where
>> the user by mistake is trying to use the same buffer pointer.
> 
> Why is the user not supposed to re-use it?
> 
> kfree()'ing the buffer doesn't prevent the caller from re-using it.

Release the memory. Assign pointer to NULL before returning.
All the error logging interfaces should have NULL check (to return).
User can't do much in that case.

Regards,
Deepthi

> cheers
> 
> 
> 
> 

^ permalink raw reply

* Re: [PATCH] powerpc: book3s: kvm: Don't abuse host r2 in exit path
From: Aneesh Kumar K.V @ 2013-12-18  7:35 UTC (permalink / raw)
  To: agraf, benh, paulus; +Cc: linuxppc-dev, kvm, kvm-ppc
In-Reply-To: <1384178387-22993-1-git-send-email-aneesh.kumar@linux.vnet.ibm.com>


Hi Alex,

Any update on this ? We need this to got into 3.13.

-aneesh 

"Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> writes:

> From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
>
> We don't use PACATOC for PR. Avoid updating HOST_R2 with PR
> KVM mode when both HV and PR are enabled in the kernel. Without this we
> get the below crash
>
> (qemu)
> Unable to handle kernel paging request for data at address 0xffffffffffff8310
> Faulting instruction address: 0xc00000000001d5a4
> cpu 0x2: Vector: 300 (Data Access) at [c0000001dc53aef0]
>     pc: c00000000001d5a4: .vtime_delta.isra.1+0x34/0x1d0
>     lr: c00000000001d760: .vtime_account_system+0x20/0x60
>     sp: c0000001dc53b170
>    msr: 8000000000009032
>    dar: ffffffffffff8310
>  dsisr: 40000000
>   current = 0xc0000001d76c62d0
>   paca    = 0xc00000000fef1100   softe: 0        irq_happened: 0x01
>     pid   = 4472, comm = qemu-system-ppc
> enter ? for help
> [c0000001dc53b200] c00000000001d760 .vtime_account_system+0x20/0x60
> [c0000001dc53b290] c00000000008d050 .kvmppc_handle_exit_pr+0x60/0xa50
> [c0000001dc53b340] c00000000008f51c kvm_start_lightweight+0xb4/0xc4
> [c0000001dc53b510] c00000000008cdf0 .kvmppc_vcpu_run_pr+0x150/0x2e0
> [c0000001dc53b9e0] c00000000008341c .kvmppc_vcpu_run+0x2c/0x40
> [c0000001dc53ba50] c000000000080af4 .kvm_arch_vcpu_ioctl_run+0x54/0x1b0
> [c0000001dc53bae0] c00000000007b4c8 .kvm_vcpu_ioctl+0x478/0x730
> [c0000001dc53bca0] c0000000002140cc .do_vfs_ioctl+0x4ac/0x770
> [c0000001dc53bd80] c0000000002143e8 .SyS_ioctl+0x58/0xb0
> [c0000001dc53be30] c000000000009e58 syscall_exit+0x0/0x98
> --- Exception: c00 (System Call) at 00001fffff960160
> SP (1ffffecbe3c0) is in userspace
>
> These changes were originally part of
> http://mid.gmane.org/20130806042205.GR19254@iris.ozlabs.ibm.com
>
> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/kvm_book3s_asm.h | 1 +
>  arch/powerpc/kernel/asm-offsets.c         | 1 +
>  arch/powerpc/kvm/book3s_hv_rmhandlers.S   | 7 +++----
>  3 files changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/kvm_book3s_asm.h b/arch/powerpc/include/asm/kvm_book3s_asm.h
> index 0bd9348..69fe837 100644
> --- a/arch/powerpc/include/asm/kvm_book3s_asm.h
> +++ b/arch/powerpc/include/asm/kvm_book3s_asm.h
> @@ -79,6 +79,7 @@ struct kvmppc_host_state {
>  	ulong vmhandler;
>  	ulong scratch0;
>  	ulong scratch1;
> +	ulong scratch2;
>  	u8 in_guest;
>  	u8 restore_hid5;
>  	u8 napping;
> diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c
> index 8e6ede6..841a4c8 100644
> --- a/arch/powerpc/kernel/asm-offsets.c
> +++ b/arch/powerpc/kernel/asm-offsets.c
> @@ -583,6 +583,7 @@ int main(void)
>  	HSTATE_FIELD(HSTATE_VMHANDLER, vmhandler);
>  	HSTATE_FIELD(HSTATE_SCRATCH0, scratch0);
>  	HSTATE_FIELD(HSTATE_SCRATCH1, scratch1);
> +	HSTATE_FIELD(HSTATE_SCRATCH2, scratch2);
>  	HSTATE_FIELD(HSTATE_IN_GUEST, in_guest);
>  	HSTATE_FIELD(HSTATE_RESTORE_HID5, restore_hid5);
>  	HSTATE_FIELD(HSTATE_NAPPING, napping);
> diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> index 339aa5e..16f7654 100644
> --- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> +++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
> @@ -750,15 +750,14 @@ kvmppc_interrupt_hv:
>  	 * guest CR, R12 saved in shadow VCPU SCRATCH1/0
>  	 * guest R13 saved in SPRN_SCRATCH0
>  	 */
> -	/* abuse host_r2 as third scratch area; we get r2 from PACATOC(r13) */
> -	std	r9, HSTATE_HOST_R2(r13)
> +	std	r9, HSTATE_SCRATCH2(r13)
>  
>  	lbz	r9, HSTATE_IN_GUEST(r13)
>  	cmpwi	r9, KVM_GUEST_MODE_HOST_HV
>  	beq	kvmppc_bad_host_intr
>  #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE
>  	cmpwi	r9, KVM_GUEST_MODE_GUEST
> -	ld	r9, HSTATE_HOST_R2(r13)
> +	ld	r9, HSTATE_SCRATCH2(r13)
>  	beq	kvmppc_interrupt_pr
>  #endif
>  	/* We're now back in the host but in guest MMU context */
> @@ -778,7 +777,7 @@ kvmppc_interrupt_hv:
>  	std	r6, VCPU_GPR(R6)(r9)
>  	std	r7, VCPU_GPR(R7)(r9)
>  	std	r8, VCPU_GPR(R8)(r9)
> -	ld	r0, HSTATE_HOST_R2(r13)
> +	ld	r0, HSTATE_SCRATCH2(r13)
>  	std	r0, VCPU_GPR(R9)(r9)
>  	std	r10, VCPU_GPR(R10)(r9)
>  	std	r11, VCPU_GPR(R11)(r9)
> -- 
> 1.8.3.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* [PATCH 1/2] powerpc/p1022ds: fix rtc compatible string
From: Dongsheng Wang @ 2013-12-18  8:39 UTC (permalink / raw)
  To: scottwood, Chang-Ming.Huang, roy.zang; +Cc: linuxppc-dev, Wang Dongsheng

From: Wang Dongsheng <dongsheng.wang@freescale.com>

RTC Hardware(ds3232) and rtc compatible string does not match.
Change "dallas,ds1339" to "dallas,ds3232".

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/boot/dts/p1022ds.dtsi b/arch/powerpc/boot/dts/p1022ds.dtsi
index 873da35..5725058 100644
--- a/arch/powerpc/boot/dts/p1022ds.dtsi
+++ b/arch/powerpc/boot/dts/p1022ds.dtsi
@@ -146,7 +146,7 @@
 			 */
 		};
 		rtc@68 {
-			compatible = "dallas,ds1339";
+			compatible = "dallas,ds3232";
 			reg = <0x68>;
 		};
 		adt7461@4c {
-- 
1.8.5

^ permalink raw reply related

* [PATCH 2/2] powerpc/p1022ds: add a interrupt for rtc node
From: Dongsheng Wang @ 2013-12-18  8:39 UTC (permalink / raw)
  To: scottwood, Chang-Ming.Huang, roy.zang; +Cc: linuxppc-dev, Wang Dongsheng
In-Reply-To: <1387355964-39486-1-git-send-email-dongsheng.wang@freescale.com>

From: Wang Dongsheng <dongsheng.wang@freescale.com>

Add an external interrupt for rtc node.

Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>

diff --git a/arch/powerpc/boot/dts/p1022ds.dtsi b/arch/powerpc/boot/dts/p1022ds.dtsi
index 5725058..957e0dc 100644
--- a/arch/powerpc/boot/dts/p1022ds.dtsi
+++ b/arch/powerpc/boot/dts/p1022ds.dtsi
@@ -148,6 +148,7 @@
 		rtc@68 {
 			compatible = "dallas,ds3232";
 			reg = <0x68>;
+			interrupts = <0x1 0x1 0 0>;
 		};
 		adt7461@4c {
 			compatible = "adi,adt7461";
-- 
1.8.5

^ permalink raw reply related

* Re: [PATCH] powerpc: Make 64-bit non-VMX __copy_tofrom_user bi-endian
From: Anton Blanchard @ 2013-12-18 10:15 UTC (permalink / raw)
  To: benh, paulus, paulmck; +Cc: linuxppc-dev
In-Reply-To: <20131218092957.4a12cbcf@kryten>


Hi,

> [ This is a rare but nasty LE issue. Most of the time we use the
> POWER7 optimised __copy_tofrom_user_power7 loop, but when it hits an
> exception we fall back to the base __copy_tofrom_user loop. - Anton ]

To try and catch any screw ups in our ppc64 memcpy and copy_tofrom_user
loops, I wrote a quick test:

http://ozlabs.org/~anton/junkcode/validate_kernel_copyloops.tar.gz

"make check" runs through all source and destination alignments for a
range of sizes. It verifies the data was copied correctly and the
redzone before and after were untouched.

It tests:

copyuser_64
copyuser_power7
memcpy_64
memcpy_power7

memcpy_64 is currently unused on LE, but I added Paul McKenney's LE
fixes regardless. copyuser_64 has the same LE fix (posted yesterday).
All loops pass the test on both LE and BE.

Anton

^ permalink raw reply

* Re: [PATCH] powerpc: book3s: kvm: Don't abuse host r2 in exit path
From: Alexander Graf @ 2013-12-18 10:30 UTC (permalink / raw)
  To: Aneesh Kumar K.V
  Cc: Paul Mackerras, linuxppc-dev, kvm-ppc,
	kvm@vger.kernel.org mailing list
In-Reply-To: <87lhzilign.fsf@linux.vnet.ibm.com>


On 18.12.2013, at 08:35, Aneesh Kumar K.V =
<aneesh.kumar@linux.vnet.ibm.com> wrote:

>=20
> Hi Alex,
>=20
> Any update on this ? We need this to got into 3.13.

Thanks, applied to for-3.13.


Alex

^ permalink raw reply

* Re: [PATCH v2 2/6] PCI/MSI: Factor out pci_get_msi_cap() interface
From: Bjorn Helgaas @ 2013-12-18 18:26 UTC (permalink / raw)
  To: Mark Lord
  Cc: Joerg Roedel, x86@kernel.org, linux-kernel@vger.kernel.org,
	Michael Ellerman, linux-ide@vger.kernel.org, Alexander Gordeev,
	Jan Beulich, linux-pci@vger.kernel.org, Tejun Heo, linuxppc-dev,
	Ingo Molnar
In-Reply-To: <52442975.9000603@start.ca>

On Thu, Sep 26, 2013 at 08:32:53AM -0400, Mark Lord wrote:
> On 13-09-18 05:48 AM, Alexander Gordeev wrote:
> >
> > The last pattern makes most of sense to me and could be updated with a more
> > clear sequence - a call to (bit modified) pci_msix_table_size() followed
> > by a call to pci_enable_msix(). I think this pattern can effectively
> > supersede the currently recommended "loop" practice.
> 
> The loop is still necessary, because there's a race between those two calls,
> so that pci_enable_msix() can still fail due to lack of MSIX slots.

Hi Mark,

Can you elaborate on this race?  My understanding is that
pci_msix_table_size() depends only on the "Table Size" field in the MSI-X
Message Control register.

So if there's a concurrency problem here, it would have to be something
like "pci_enable_msix() may not be able to configure the requested number
of vectors because it has to allocate from a shared pool."

If that's the case, pci_msix_table_size() wouldn't be involved at all, and
the only question is how to coordinate between several drivers that each
call pci_enable_msix().  I think that would have to be resolved in some
arch hook used by the PCI core.

Maybe this is already taken care of; I just want to make sure we don't
overlook an issue here.

Bjorn

^ permalink raw reply

* Re: commit e38c0a1f breaks powerpc boards with uli1575 chip
From: Rob Herring @ 2013-12-18 18:40 UTC (permalink / raw)
  To: Nikita Yushchenko, Arnd Bergmann, Thierry Reding, Grant Likely,
	devicetree@vger.kernel.org, Kumar Gala
  Cc: Alexey Lugovskoy, linuxppc-dev, linux-kernel, Dmitry Krivoschekov
In-Reply-To: <201312171135.38576@blacky.localdomain>

[fixed DT maillist address]

On 12/17/2013 01:35 AM, Nikita Yushchenko wrote:
> Hi
> 
> While trying to make freescale p2020ds and  mpc8572ds boards working with mainline kernel, I faced that commit 
> e38c0a1f (Handle #address-cells > 2 specially) breaks things with these boards.

Good to see this broke in v3.7 and is just now found...

> 
> Both these boards have uli1575 chip.
> Corresponding part in device tree is something like
> 
>                 uli1575@0 {
>                         reg = <0x0 0x0 0x0 0x0 0x0>;
>                         #size-cells = <2>;
>                         #address-cells = <3>;
>                         ranges = <0x2000000 0x0 0x80000000
>                                   0x2000000 0x0 0x80000000
>                                   0x0 0x20000000
> 
>                                   0x1000000 0x0 0x0
>                                   0x1000000 0x0 0x0
>                                   0x0 0x10000>;
>                         isa@1e {
> ...
> 
> I.e. it has #address-cells = <3>
> 
> 
> With commit e38c0a1f reverted, devices under uli1575 are registered correctly, e.g. for rtc
> 
> OF: ** translation for device /pcie@ffe09000/pcie@0/uli1575@0/isa@1e/rtc@70 **
> OF: bus is isa (na=2, ns=1) on /pcie@ffe09000/pcie@0/uli1575@0/isa@1e
> OF: translating address: 00000001 00000070
> OF: parent bus is default (na=3, ns=2) on /pcie@ffe09000/pcie@0/uli1575@0
> OF: walking ranges...
> OF: ISA map, cp=0, s=1000, da=70
> OF: parent translation for: 01000000 00000000 00000000
> OF: with offset: 70
> OF: one level translation: 00000000 00000000 00000070
> OF: parent bus is pci (na=3, ns=2) on /pcie@ffe09000/pcie@0
> OF: walking ranges...
> OF: default map, cp=a0000000, s=20000000, da=70
> OF: default map, cp=0, s=10000, da=70
> OF: parent translation for: 01000000 00000000 00000000
> OF: with offset: 70
> OF: one level translation: 01000000 00000000 00000070
> OF: parent bus is pci (na=3, ns=2) on /pcie@ffe09000
> OF: walking ranges...
> OF: PCI map, cp=0, s=10000, da=70
> OF: parent translation for: 01000000 00000000 00000000
> OF: with offset: 70
> OF: one level translation: 01000000 00000000 00000070
> OF: parent bus is default (na=2, ns=2) on /
> OF: walking ranges...
> OF: PCI map, cp=0, s=10000, da=70
> OF: parent translation for: 00000000 ffc10000
> OF: with offset: 70
> OF: one level translation: 00000000 ffc10070
> OF: reached root node
> 
> With commit e38c0a1f in place, address translation fails:
> 
> OF: ** translation for device /pcie@ffe09000/pcie@0/uli1575@0/isa@1e/rtc@70 **
> OF: bus is isa (na=2, ns=1) on /pcie@ffe09000/pcie@0/uli1575@0/isa@1e
> OF: translating address: 00000001 00000070
> OF: parent bus is default (na=3, ns=2) on /pcie@ffe09000/pcie@0/uli1575@0
> OF: walking ranges...
> OF: ISA map, cp=0, s=1000, da=70
> OF: parent translation for: 01000000 00000000 00000000
> OF: with offset: 70
> OF: one level translation: 00000000 00000000 00000070
> OF: parent bus is pci (na=3, ns=2) on /pcie@ffe09000/pcie@0
> OF: walking ranges...
> OF: default map, cp=a0000000, s=20000000, da=70
> OF: default map, cp=0, s=10000, da=70
> OF: not found !
> 
> Either e38c0a1f should be reverted, or uli1575 (and perhaps other similar devices) have to be described in device 
> trees differently.

Reverting would break Tegra PCIe, but you should not have to change the
DT either. So we need a solution.

Is this something like this sufficient to fix it?

diff --git a/drivers/of/address.c b/drivers/of/address.c
index 4b9317b..378aebd 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -74,7 +74,7 @@ static u64 of_bus_default_map(__be32 *addr, const
__be32 *range,
         * mapping doesn't specify a physical address. Rather, the address
         * specifies an identifier that must match exactly.
         */
-       if (na > 2 && memcmp(range, addr, na * 4) != 0)
+       if (na > 2 && memcmp(range, addr, (na - 2) * 4) != 0)
                return OF_BAD_ADDR;

        if (da < cp || da >= (cp + s))

^ permalink raw reply related

* Re: [RFC][PATCH v1] ASoC: fsl_ssi: Add DAI master mode support for SSI on i.MX series
From: Mark Brown @ 2013-12-18 18:59 UTC (permalink / raw)
  To: Nicolin Chen; +Cc: alsa-devel, lgirdwood, tiwai, timur, perex, linuxppc-dev
In-Reply-To: <1386845085-21682-1-git-send-email-Guangyu.Chen@freescale.com>

[-- Attachment #1: Type: text/plain, Size: 681 bytes --]

On Thu, Dec 12, 2013 at 06:44:45PM +0800, Nicolin Chen wrote:

> +/**
> + * fsl_ssi_set_dai_tdm_slot - set TDM slot number
> + *
> + * Note: This function can be only called when using SSI as DAI master
> + */
> +static int fsl_ssi_set_dai_tdm_slot(struct snd_soc_dai *cpu_dai, u32 tx_mask,
> +				u32 rx_mask, int slots, int slot_width)
> +{
> +	struct fsl_ssi_private *ssi_private = snd_soc_dai_get_drvdata(cpu_dai);
> +	struct ccsr_ssi __iomem *ssi = ssi_private->ssi;
> +	u32 val;

I'm a bit concernred about what this is for and why it's required - is
it something that machine drivers have to call and if it is shouldn't
the driver be defaulting to a sensible configuration?

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply

* [PATCH] rapidio: add modular rapidio core build into powerpc and mips branches
From: Alexandre Bounine @ 2013-12-18 18:57 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linuxppc-dev
  Cc: linux-mips, Ralf Baechle, Alexandre Bounine, Jean Delvare

Allow modular build option for RapidIO subsystem core in MIPS and PowerPC
architectural branches.

At this moment modular RapidIO subsystem build is enabled only for platforms
that use PCI/PCIe based RapidIO controllers (e.g. Tsi721).

Signed-off-by: Alexandre Bounine <alexandre.bounine@idt.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Jean Delvare <jdelvare@suse.de>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Li Yang <leoli@freescale.com>
Cc: linux-mips@linux-mips.org
---
 arch/mips/Kconfig    |    2 +-
 arch/powerpc/Kconfig |    4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 650de39..e6a8a7a 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -2442,7 +2442,7 @@ source "drivers/pcmcia/Kconfig"
 source "drivers/pci/hotplug/Kconfig"
 
 config RAPIDIO
-	bool "RapidIO support"
+	tristate "RapidIO support"
 	depends on PCI
 	default n
 	help
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index b44b52c..992531f 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -790,7 +790,7 @@ config HAS_RAPIDIO
 	default n
 
 config RAPIDIO
-	bool "RapidIO support"
+	tristate "RapidIO support"
 	depends on HAS_RAPIDIO || PCI
 	help
 	  If you say Y here, the kernel will include drivers and
@@ -798,7 +798,7 @@ config RAPIDIO
 
 config FSL_RIO
 	bool "Freescale Embedded SRIO Controller support"
-	depends on RAPIDIO && HAS_RAPIDIO
+	depends on RAPIDIO = y && HAS_RAPIDIO
 	default "n"
 	---help---
 	  Include support for RapidIO controller on Freescale embedded
-- 
1.7.8.4

^ permalink raw reply related

* Re: [PATCH v1 0/4] powerpc/512x: update COMMON_CLK support for MPC5125
From: Anatolij Gustschin @ 2013-12-18 19:53 UTC (permalink / raw)
  To: Gerhard Sittig
  Cc: Mike Turquette, Detlev Zundel, Matteo Facchinetti, Scott Wood,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <1386681097-14126-1-git-send-email-gsi@denx.de>

On Tue, 10 Dec 2013 14:11:33 +0100
Gerhard Sittig <gsi@denx.de> wrote:
...
> Gerhard Sittig (4):
>   powerpc/512x: clk: minor comment updates
>   powerpc/512x: clk: enforce even SDHC divider values
>   powerpc/512x: clk: support MPC5121/5123/5125 SoC variants
>   powerpc/512x: dts: add MPC5125 clock specs
> 
>  arch/powerpc/boot/dts/mpc5125twr.dts          |   53 +++-
>  arch/powerpc/include/asm/mpc5121.h            |    7 +-
>  arch/powerpc/platforms/512x/clock-commonclk.c |  369 +++++++++++++++++++++----
>  include/dt-bindings/clock/mpc512x-clock.h     |    9 +-
>  4 files changed, 386 insertions(+), 52 deletions(-)

Applied this series to mpc5xxx next. Thanks!

Anatolij

^ permalink raw reply

* Re: [PATCH v1 1/1] powerpc/512x: dts: remove misplaced IRQ spec from 'soc' node (5125)
From: Anatolij Gustschin @ 2013-12-18 19:54 UTC (permalink / raw)
  To: Gerhard Sittig; +Cc: devicetree, linuxppc-dev
In-Reply-To: <1386669068-2477-1-git-send-email-gsi@denx.de>

On Tue, 10 Dec 2013 10:51:08 +0100
Gerhard Sittig <gsi@denx.de> wrote:

> the 'soc' node in the MPC5125 "tower" board .dts has an '#interrupt-cells'
> property although this node is not an interrupt controller
> 
> remove this erroneously placed property because starting with v3.13-rc1
> lookup and resolution of 'interrupts' specs for peripherals gets misled
> (tries to use the 'soc' as the interrupt parent which fails), emits
> 'no irq domain found' WARN() messages and breaks the boot process
> 
> [ best viewed with 'git diff -U5' to have DT node names in the context ]
> 
> Cc: Anatolij Gustschin <agust@denx.de>
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: devicetree@vger.kernel.org
> Signed-off-by: Gerhard Sittig <gsi@denx.de>
> 
> ---
> 
> note that this is not a resend of the previous MPC5121 fix, but instead
> is a fix for MPC5125 along the same lines of the MPC5121 fix
> ---
>  arch/powerpc/boot/dts/mpc5125twr.dts |    1 -
>  1 file changed, 1 deletion(-)

Applied. Thanks!

Anatolij

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox