LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] powerpc: ftrace: bugfix for test_24bit_addr
From: Liu ping fan @ 2014-02-26  6:06 UTC (permalink / raw)
  To: ananth; +Cc: Paul Mackerras, linuxppc-dev, Anton Blanchard
In-Reply-To: <20140226043552.GA14672@in.ibm.com>

On Wed, Feb 26, 2014 at 12:35 PM, Ananth N Mavinakayanahalli
<ananth@in.ibm.com> wrote:
> On Wed, Feb 26, 2014 at 10:23:01AM +0800, Liu Ping Fan wrote:
>> The branch target should be the func addr, not the addr of func_descr_t.
>> So using ppc_function_entry() to generate the right target addr.
>>
>> Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
>> ---
>> This bug will make ftrace fail to work. It can be triggered when the kernel
>> size grows up.
>> ---
>>  arch/powerpc/kernel/ftrace.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
>> index 9b27b29..b0ded97 100644
>> --- a/arch/powerpc/kernel/ftrace.c
>> +++ b/arch/powerpc/kernel/ftrace.c
>> @@ -74,6 +74,7 @@ ftrace_modify_code(unsigned long ip, unsigned int old, unsigned int new)
>>   */
>>  static int test_24bit_addr(unsigned long ip, unsigned long addr)
>>  {
>> +     addr = ppc_function_entry((void *)addr);
>
> Won't this break on LE?
>
How? I can not figure out it. Anyway, ppc_function_entry() is already
used in other places with LE.

Thx,
Fan

^ permalink raw reply

* [PATCH] powerpc: Increase stack redzone for 64-bit userspace to 512 bytes
From: Paul Mackerras @ 2014-02-26  6:07 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Anton Blanchard

The new ELFv2 little-endian ABI increases the stack redzone -- the
area below the stack pointer that can be used for storing data --
from 288 bytes to 512 bytes.  This means that we need to allow more
space on the user stack when delivering a signal to a 64-bit process.

To make the code a bit clearer, we define new USER_REDZONE_SIZE and
KERNEL_REDZONE_SIZE symbols in ptrace.h.  For now, we leave the
kernel redzone size at 288 bytes, since increasing it to 512 bytes
would increase the size of interrupt stack frames correspondingly.
Gcc currently only makes use of 288 bytes of redzone, even when
compiling for little-endian.  In future, hopefully gcc will provide
an option to control the amount of redzone used, and then we could
reduce it even more.

This also changes the code in arch_compat_alloc_user_space() to
preserve the expanded redzone.  It is not clear why this function would
ever be used on a 64-bit process, though.

Signed-off-by: Paul Mackerras <paulus@samba.org>
---
 arch/powerpc/include/asm/compat.h |  5 +++--
 arch/powerpc/include/asm/ptrace.h | 16 +++++++++++++++-
 arch/powerpc/kernel/signal_64.c   |  4 ++--
 3 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/compat.h b/arch/powerpc/include/asm/compat.h
index 84fdf68..a613d2c 100644
--- a/arch/powerpc/include/asm/compat.h
+++ b/arch/powerpc/include/asm/compat.h
@@ -200,10 +200,11 @@ static inline void __user *arch_compat_alloc_user_space(long len)
 
 	/*
 	 * We can't access below the stack pointer in the 32bit ABI and
-	 * can access 288 bytes in the 64bit ABI
+	 * can access 288 bytes in the 64bit big-endian ABI,
+	 * or 512 bytes with the new ELFv2 little-endian ABI.
 	 */
 	if (!is_32bit_task())
-		usp -= 288;
+		usp -= USER_REDZONE_SIZE;
 
 	return (void __user *) (usp - len);
 }
diff --git a/arch/powerpc/include/asm/ptrace.h b/arch/powerpc/include/asm/ptrace.h
index becc08e..279b80f 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -28,11 +28,23 @@
 
 #ifdef __powerpc64__
 
+/*
+ * Size of redzone that userspace is allowed to use below the stack
+ * pointer.  This is 288 in the 64-bit big-endian ELF ABI, and 512 in
+ * the new ELFv2 little-endian ABI, so we allow the larger amount.
+ *
+ * For kernel code we allow a 288-byte redzone, in order to conserve
+ * kernel stack space; gcc currently only uses 288 bytes, and will
+ * hopefully allow explicit control of the redzone size in future.
+ */
+#define USER_REDZONE_SIZE	512
+#define KERNEL_REDZONE_SIZE	288
+
 #define STACK_FRAME_OVERHEAD	112	/* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE	2	/* Location of LR in stack frame */
 #define STACK_FRAME_REGS_MARKER	ASM_CONST(0x7265677368657265)
 #define STACK_INT_FRAME_SIZE	(sizeof(struct pt_regs) + \
-					STACK_FRAME_OVERHEAD + 288)
+				 STACK_FRAME_OVERHEAD + KERNEL_REDZONE_SIZE)
 #define STACK_FRAME_MARKER	12
 
 /* Size of dummy stack frame allocated when calling signal handler. */
@@ -41,6 +53,8 @@
 
 #else /* __powerpc64__ */
 
+#define USER_REDZONE_SIZE	0
+#define KERNEL_REDZONE_SIZE	0
 #define STACK_FRAME_OVERHEAD	16	/* size of minimum stack frame */
 #define STACK_FRAME_LR_SAVE	1	/* Location of LR in stack frame */
 #define STACK_FRAME_REGS_MARKER	ASM_CONST(0x72656773)
diff --git a/arch/powerpc/kernel/signal_64.c b/arch/powerpc/kernel/signal_64.c
index e35bf77..8d253c2 100644
--- a/arch/powerpc/kernel/signal_64.c
+++ b/arch/powerpc/kernel/signal_64.c
@@ -65,8 +65,8 @@ struct rt_sigframe {
 	struct siginfo __user *pinfo;
 	void __user *puc;
 	struct siginfo info;
-	/* 64 bit ABI allows for 288 bytes below sp before decrementing it. */
-	char abigap[288];
+	/* New 64 bit little-endian ABI allows redzone of 512 bytes below sp */
+	char abigap[USER_REDZONE_SIZE];
 } __attribute__ ((aligned (16)));
 
 static const char fmt32[] = KERN_INFO \
-- 
1.9.rc1

^ permalink raw reply related

* Re: [PATCH 0/7] DMA: Freescale: driver cleanups and enhancements
From: Hongbo Zhang @ 2014-02-26  7:38 UTC (permalink / raw)
  To: vinod.koul, dan.j.williams, dmaengine
  Cc: scottwood, linuxppc-dev, linux-kernel
In-Reply-To: <1389851246-8564-1-git-send-email-hongbo.zhang@freescale.com>

Hi Vinod,
How about these patches?
Thanks.


On 01/16/2014 01:47 PM, hongbo.zhang@freescale.com wrote:
> From: Hongbo Zhang <hongbo.zhang@freescale.com>
>
> Hi Vinod Koul and Dan Williams,
> Please have a look at these patches.
>
> Note that patch 2~6 had beed sent out for upstream before, but were together
> with other storage patches at that time, that was not easy for being reviewed
> and merged, so I send them separately this time.
>
> Thanks.
>
> Hongbo Zhang (7):
>    DMA: Freescale: unify register access methods
>    DMA: Freescale: remove attribute DMA_INTERRUPT of dmaengine
>    DMA: Freescale: add fsl_dma_free_descriptor() to reduce code
>      duplication
>    DMA: Freescale: move functions to avoid forward declarations
>    DMA: Freescale: change descriptor release process for supporting
>      async_tx
>    DMA: Freescale: use spin_lock_bh instead of spin_lock_irqsave
>    DMA: Freescale: add suspend resume functions for DMA driver
>
>   drivers/dma/fsldma.c |  592 ++++++++++++++++++++++++++++++++------------------
>   drivers/dma/fsldma.h |   33 ++-
>   2 files changed, 412 insertions(+), 213 deletions(-)
>

^ permalink raw reply

* Re: [PATCH] mm: numa: bugfix for LAST_CPUPID_NOT_IN_PAGE_FLAGS
From: Aneesh Kumar K.V @ 2014-02-26  7:52 UTC (permalink / raw)
  To: Andrew Morton, Liu Ping Fan
  Cc: Peter Zijlstra, linux-mm, Paul Mackerras, linuxppc-dev
In-Reply-To: <20140213152009.b16a30d2a5b5c5706fc8952a@linux-foundation.org>

Andrew Morton <akpm@linux-foundation.org> writes:

> On Wed,  5 Feb 2014 09:25:46 +0800 Liu Ping Fan <qemulist@gmail.com> wrote:
>
>> When doing some numa tests on powerpc, I triggered an oops bug. I find
>> it is caused by using page->_last_cpupid.  It should be initialized as
>> "-1 & LAST_CPUPID_MASK", but not "-1". Otherwise, in task_numa_fault(),
>> we will miss the checking (last_cpupid == (-1 & LAST_CPUPID_MASK)).
>> And finally cause an oops bug in task_numa_group(), since the online cpu is
>> less than possible cpu.
>
> I grabbed this.  I added this to the changelog:
>
> : PPC needs the LAST_CPUPID_NOT_IN_PAGE_FLAGS case because ppc needs to
> : support a large physical address region, up to 2^46 but small section size
> : (2^24).  So when NR_CPUS grows up, it is easily to cause
> : not-in-page-flags.
>
> to hopefully address Peter's observation.
>
> How should we proceed with this?  I'm getting the impression that numa
> balancing on ppc is a dead duck in 3.14, so perhaps this and 
>
> powerpc-mm-add-new-set-flag-argument-to-pte-pmd-update-function.patch
> mm-dirty-accountable-change-only-apply-to-non-prot-numa-case.patch
> mm-use-ptep-pmdp_set_numa-for-updating-_page_numa-bit.patch
>

All these are already in 3.14  ?

> are 3.15-rc1 material?
>

We should push the first hunk to 3.14. I will wait for Liu to redo the
patch. BTW this should happen only when SPARSE_VMEMMAP is not
specified. Srikar had reported the issue here

http://mid.gmane.org/20140219180200.GA29257@linux.vnet.ibm.com

#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#define SECTIONS_WIDTH		SECTIONS_SHIFT
#else
#define SECTIONS_WIDTH		0
#endif

-aneesh

^ permalink raw reply

* Re: [PATCH v2 02/11] perf core: export swevent hrtimer helpers
From: Peter Zijlstra @ 2014-02-26  8:29 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: LKML, Ingo Molnar, Paul Mackerras, Arnaldo Carvalho de Melo,
	Linux PPC
In-Reply-To: <530D0D57.4030704@linux.vnet.ibm.com>

On Tue, Feb 25, 2014 at 01:38:31PM -0800, Cody P Schafer wrote:
> On 02/25/2014 02:20 AM, Peter Zijlstra wrote:
> >On Tue, Feb 25, 2014 at 02:33:26PM +1100, Michael Ellerman wrote:
> >>On Fri, 2014-14-02 at 22:02:06 UTC, Cody P Schafer wrote:
> >>>Export the swevent hrtimer helpers currently only used in events/core.c
> >>>to allow the addition of architecture specific sw-like pmus.
> >>
> >>Peter, Ingo, can we get your ACK on this please?
> >
> >How are they used? I saw some usage in patch 9 or so; but its not
> >explained anywhere. All patches have non-existent Changelogs and the few
> >comments that are there are pretty hardware specific.
> >
> >So please do tell; what do you need this for?
> 
> From this patch's change log:
> 
> >Export the swevent hrtimer helpers currently only used in events/core.c to allow the addition of architecture specific sw-like pmus.
> 
> The key part here is "architecture specific sw-like pmus", where the
> announcement explains why these pmus are sw-like:

I don't read announcements for crucial patch details; announcements are
lost and therefore unimportant.

> >The counters supplied by these interfaces are continually counting and never
> >need to be (and cannot be) disabled or enabled. They additionally do not
> >generate any interrupts. This makes them in some regards similar to software
> >counters, and as a result their implimentation shares some common code (which
> >an initial patch exposes) with the sw counters.
> 
> Essentially, these pmus just provide access to a big array of counters which
> don't generate interrupts, and are all 64bit (and assumed to never
> overflow). Rather than duplicate the code that we already have for managing
> timing when reading from counters that don't have interrupts (the functions
> that are exposed by this patch), I've reused it.

So note that all the software counters generate interrupts in their own
measuring domain. The hrtimer ones measure time and generate time based
interrupts, the event based ones generate 'interrupts' on their events.

What you have here is a hw pmu without interrupt capability. That's
fine, they don't get to generate interrupt. We have plenty of those
already.

But what you propose to do is add interrupt in another domain entirely.
That's not fine. Don't do that.

You also try and conceal this information; so you suck.

^ permalink raw reply

* Re: Build regressions/improvements in v3.14-rc4
From: Geert Uytterhoeven @ 2014-02-26  9:54 UTC (permalink / raw)
  To: linux-kernel@vger.kernel.org; +Cc: linuxppc-dev@lists.ozlabs.org, Linux-sh list
In-Reply-To: <1393408336-1367-1-git-send-email-geert@linux-m68k.org>

On Wed, Feb 26, 2014 at 10:52 AM, Geert Uytterhoeven
<geert@linux-m68k.org> wrote:
> JFYI, when comparing v3.14-rc4[1]  to v3.14-rc3[3], the summaries are:
>   - build errors: +3/-15

  + /scratch/kisskb/src/arch/sh/mm/cache-sh4.c: error:
'cached_to_uncached' undeclared (first use in this function):  =>
99:17
  + /scratch/kisskb/src/arch/sh/mm/cache-sh4.c: error: implicit
declaration of function 'cpu_context'
[-Werror=implicit-function-declaration]:  => 192:2

sh-randconfig

  + error: No rule to make target include/config/auto.conf:  => N/A

powerpc-randconfig

> [1] http://kisskb.ellerman.id.au/kisskb/head/7212/ (all 119 configs)
> [3] http://kisskb.ellerman.id.au/kisskb/head/7190/ (all 119 configs)

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply

* Re: [PATCH] powerpc: ftrace: bugfix for test_24bit_addr
From: Michael Ellerman @ 2014-02-26 12:22 UTC (permalink / raw)
  To: ananth; +Cc: linuxppc-dev, Paul Mackerras, Anton Blanchard, Liu Ping Fan
In-Reply-To: <20140226043552.GA14672@in.ibm.com>

On Wed, 2014-02-26 at 10:05 +0530, Ananth N Mavinakayanahalli wrote:
> On Wed, Feb 26, 2014 at 10:23:01AM +0800, Liu Ping Fan wrote:
> > The branch target should be the func addr, not the addr of func_descr_t.
> > So using ppc_function_entry() to generate the right target addr.
> > 
> > Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
> > ---
> > This bug will make ftrace fail to work. It can be triggered when the kernel
> > size grows up.
> > ---
> >  arch/powerpc/kernel/ftrace.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
> > index 9b27b29..b0ded97 100644
> > --- a/arch/powerpc/kernel/ftrace.c
> > +++ b/arch/powerpc/kernel/ftrace.c
> > @@ -74,6 +74,7 @@ ftrace_modify_code(unsigned long ip, unsigned int old, unsigned int new)
> >   */
> >  static int test_24bit_addr(unsigned long ip, unsigned long addr)
> >  {
> > +	addr = ppc_function_entry((void *)addr);
> 
> Won't this break on LE?

ppc_function_entry() just needs to be a nop on LE, as it already is on 32-bit.

cheers

^ permalink raw reply

* Re: [PATCH v2 1/1] audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
From: Michael Ellerman @ 2014-02-26 12:31 UTC (permalink / raw)
  To: AKASHI Takahiro
  Cc: linux-s390, linaro-kernel, linux-ia64, linux-parisc,
	user-mode-linux-devel, linux-sh, rgb, catalin.marinas, x86,
	will.deacon, arndb, eparis, linux-kernel, dsaxena, viro,
	user-mode-linux-user, linux-alpha, sparclinux, linux-audit,
	linuxppc-dev, linux-arm-kernel
In-Reply-To: <1393319784-2758-2-git-send-email-takahiro.akashi@linaro.org>

On Tue, 2014-02-25 at 18:16 +0900, AKASHI Takahiro wrote:
> Currently AUDITSYSCALL has a long list of architecture depencency:
>        depends on AUDIT && (X86 || PARISC || PPC || S390 || IA64 || UML ||
> 		SPARC64 || SUPERH || (ARM && AEABI && !OABI_COMPAT) || ALPHA)
> The purpose of this patch is to replace it with HAVE_ARCH_AUDITSYSCALL
> for simplicity.
> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
>
> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 957bf34..7b3b8fe 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -141,6 +141,7 @@ config PPC
>  	select HAVE_DEBUG_STACKOVERFLOW
>  	select HAVE_IRQ_EXIT_ON_IRQ_STACK
>  	select ARCH_USE_CMPXCHG_LOCKREF if PPC64
> +	select HAVE_ARCH_AUDITSYSCALL
>  
>  config GENERIC_CSUM
>  	def_bool CPU_LITTLE_ENDIAN

Looks good for powerpc.

Acked-by: Michael Ellerman <mpe@ellerman.id.au>

cheers

^ permalink raw reply

* Re: [PATCH] powerpc/crashdump : fix page frame number check in copy_oldmem_page
From: Laurent Dufour @ 2014-02-26 14:04 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: Paul Mackerras, linuxppc-dev
In-Reply-To: <1393292822.13244.5.camel@concordia>

On 25/02/2014 02:47, Michael Ellerman wrote:
> On Mon, 2014-02-24 at 17:30 +0100, Laurent Dufour wrote:
>> In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
>> decide if the page is backed or not, is not valid when the memory layout is
>> not continuous.
>>
>> This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
>> in the memory. In that case max_pfn points to the end of RTAS, and a hole
>> between the end of the kdump kernel and RTAS is not backed by PTEs. As a
>> consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
>> in a direct way the pages in that hole.
>>
>> This fix relies on the memblock's service memblock_is_region_memory to
>> check if the read page is part or not of the directly accessible memory.
> 
> Hi Laurent,
> 
> This looks good to me, assuming you've tested it on a PowerVM system as well as
> under KVM.

Hi Michael,

Yes I tested it on PowerVM (BE), KVM (BE) and Qemu TCG (BE).

Cheers,
Laurent.

^ permalink raw reply

* Re: [PATCH] powerpc/crashdump : fix page frame number check in copy_oldmem_page
From: Mahesh J Salgaonkar @ 2014-02-26 18:38 UTC (permalink / raw)
  To: Laurent Dufour; +Cc: Paul Mackerras, linuxppc-dev
In-Reply-To: <20140224163055.7263.86979.stgit@nimbus>

On 2014-02-24 17:30:55 Mon, Laurent Dufour wrote:
> In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
> decide if the page is backed or not, is not valid when the memory layout is
> not continuous.
> 
> This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
> in the memory. In that case max_pfn points to the end of RTAS, and a hole
> between the end of the kdump kernel and RTAS is not backed by PTEs. As a
> consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
> in a direct way the pages in that hole.
> 
> This fix relies on the memblock's service memblock_is_region_memory to
> check if the read page is part or not of the directly accessible memory.
> 
> Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>

Tested on PowerNV (BE), where without this patch we see cp and 
makedumpfile fails with "Bad address" while reading /proc/vmcore.
With this patch makedumpfile and cp succeeds.

Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>

> ---
>  arch/powerpc/kernel/crash_dump.c |    8 +++++---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/crash_dump.c b/arch/powerpc/kernel/crash_dump.c
> index 11c1d06..7a13f37 100644
> --- a/arch/powerpc/kernel/crash_dump.c
> +++ b/arch/powerpc/kernel/crash_dump.c
> @@ -98,17 +98,19 @@ ssize_t copy_oldmem_page(unsigned long pfn, char *buf,
>  			size_t csize, unsigned long offset, int userbuf)
>  {
>  	void  *vaddr;
> +	phys_addr_t paddr;
>  
>  	if (!csize)
>  		return 0;
>  
>  	csize = min_t(size_t, csize, PAGE_SIZE);
> +	paddr = pfn << PAGE_SHIFT;
>  
> -	if ((min_low_pfn < pfn) && (pfn < max_pfn)) {
> -		vaddr = __va(pfn << PAGE_SHIFT);
> +	if (memblock_is_region_memory(paddr, csize)) {
> +		vaddr = __va(paddr);
>  		csize = copy_oldmem_vaddr(vaddr, buf, csize, offset, userbuf);
>  	} else {
> -		vaddr = __ioremap(pfn << PAGE_SHIFT, PAGE_SIZE, 0);
> +		vaddr = __ioremap(paddr, PAGE_SIZE, 0);
>  		csize = copy_oldmem_vaddr(vaddr, buf, csize, offset, userbuf);
>  		iounmap(vaddr);
>  	}
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

-- 
Mahesh J Salgaonkar

^ permalink raw reply

* Re: [PATCH] powerpc: ftrace: bugfix for test_24bit_addr
From: Tony Breeds @ 2014-02-26 23:19 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Paul Mackerras, linuxppc-dev, Liu Ping Fan, Anton Blanchard
In-Reply-To: <1393417336.17341.0.camel@concordia>

[-- Attachment #1: Type: text/plain, Size: 195 bytes --]

On Wed, Feb 26, 2014 at 11:22:16PM +1100, Michael Ellerman wrote:
 
> ppc_function_entry() just needs to be a nop on LE, as it already is on 32-bit.

Well on LE ABI2, but yes.

Yours Tony

[-- Attachment #2: Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply

* [RFC PATCH] powerpc: allow allyesconfig to build more
From: Stephen Rothwell @ 2014-02-27  6:17 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Mahesh Salgaonkar, Michael Neuling, ppc-dev, paulus

[-- Attachment #1: Type: text/plain, Size: 2215 bytes --]

Fixes this build error:

arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
arch/powerpc/kernel/exceptions-64s.S:1312: Error: attempt to move .org backwards

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
---
 arch/powerpc/kernel/exceptions-64s.S | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

This builds allyesconfig better (we still have RELOC failures in the
link) and hopefully fixes the allmodconfig build, but I don't know if
it is semantically OK.

diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S
index 38d507306a11..b87859ffc8e7 100644
--- a/arch/powerpc/kernel/exceptions-64s.S
+++ b/arch/powerpc/kernel/exceptions-64s.S
@@ -1294,16 +1294,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_VSX)
 	.globl	__end_handlers
 __end_handlers:
 
-	/* Equivalents to the above handlers for relocation-on interrupt vectors */
-	STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist)
-	MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell)
-
-	STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
-	STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
-	STD_RELON_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable)
-	STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable)
-	STD_RELON_EXCEPTION_HV_OOL(0xf80, hv_facility_unavailable)
-
 #if defined(CONFIG_PPC_PSERIES) || defined(CONFIG_PPC_POWERNV)
 /*
  * Data area reserved for FWNMI option.
@@ -1325,6 +1315,16 @@ fwnmi_data_area:
 initial_stab:
 	.space	4096
 
+	/* Equivalents to the above handlers for relocation-on interrupt vectors */
+	STD_RELON_EXCEPTION_HV_OOL(0xe40, emulation_assist)
+	MASKABLE_RELON_EXCEPTION_HV_OOL(0xe80, h_doorbell)
+
+	STD_RELON_EXCEPTION_PSERIES_OOL(0xf00, performance_monitor)
+	STD_RELON_EXCEPTION_PSERIES_OOL(0xf20, altivec_unavailable)
+	STD_RELON_EXCEPTION_PSERIES_OOL(0xf40, vsx_unavailable)
+	STD_RELON_EXCEPTION_PSERIES_OOL(0xf60, facility_unavailable)
+	STD_RELON_EXCEPTION_HV_OOL(0xf80, hv_facility_unavailable)
+
 #ifdef CONFIG_PPC_POWERNV
 _GLOBAL(opal_mc_secondary_handler)
 	HMT_MEDIUM_PPR_DISCARD
-- 
1.9.0

-- 
Cheers,
Stephen Rothwell                    sfr@canb.auug.org.au

[-- Attachment #2: Type: application/pgp-signature, Size: 836 bytes --]

^ permalink raw reply related

* [PATCH] Fix physical address range check comparision and force address to be virtual address
From: Benjamin Krill @ 2014-02-27 14:49 UTC (permalink / raw)
  To: benh, linuxppc-dev; +Cc: ralphbel

The previous code added wrong TLBs and causes machine check errors.

Signed-off-by: Ralph E. Bellofatto <ralphbel@us.ibm.com>
Signed-off-by: Benjamin Krill <ben@codiert.org>
---
 arch/powerpc/mm/tlb_low_64e.S | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
index c95eb32..6bf5050 100644
--- a/arch/powerpc/mm/tlb_low_64e.S
+++ b/arch/powerpc/mm/tlb_low_64e.S
@@ -1091,7 +1091,8 @@ tlb_load_linear:
 	ld	r11,PACATOC(r13)
 	ld	r11,linear_map_top@got(r11)
 	ld	r10,0(r11)
-	cmpld	cr0,r10,r16
+	tovirt(10,10)
+	cmpld	cr0,r16,r10
 	bge	tlb_load_linear_fault
 
 	/* MAS1 need whole new setup. */
-- 
1.8.5.3

^ permalink raw reply related

* Re: [PATCH v2 02/11] perf core: export swevent hrtimer helpers
From: Cody P Schafer @ 2014-02-27 19:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Ingo Molnar, Paul Mackerras, Arnaldo Carvalho de Melo,
	Linux PPC
In-Reply-To: <20140226082943.GC18404@twins.programming.kicks-ass.net>

On 02/26/2014 12:29 AM, Peter Zijlstra wrote:
> On Tue, Feb 25, 2014 at 01:38:31PM -0800, Cody P Schafer wrote:
>> On 02/25/2014 02:20 AM, Peter Zijlstra wrote:
>>> On Tue, Feb 25, 2014 at 02:33:26PM +1100, Michael Ellerman wrote:
>>>> On Fri, 2014-14-02 at 22:02:06 UTC, Cody P Schafer wrote:
>>>>> Export the swevent hrtimer helpers currently only used in events/core.c
>>>>> to allow the addition of architecture specific sw-like pmus.
>>>>
>>>> Peter, Ingo, can we get your ACK on this please?
>>>
>>> How are they used? I saw some usage in patch 9 or so; but its not
>>> explained anywhere. All patches have non-existent Changelogs and the few
>>> comments that are there are pretty hardware specific.
>>>
>>> So please do tell; what do you need this for?
>>
>>  From this patch's change log:
>>
>>> Export the swevent hrtimer helpers currently only used in events/core.c to allow the addition of architecture specific sw-like pmus.
>>
>> The key part here is "architecture specific sw-like pmus", where the
>> announcement explains why these pmus are sw-like:
>
> I don't read announcements for crucial patch details; announcements are
> lost and therefore unimportant.

And I'll be sure to elaborate further in the changelog next time (if I 
don't drop this change entirely).
This is the first comment I've got on this particular patch.

>>> The counters supplied by these interfaces are continually counting and never
>>> need to be (and cannot be) disabled or enabled. They additionally do not
>>> generate any interrupts. This makes them in some regards similar to software
>>> counters, and as a result their implimentation shares some common code (which
>>> an initial patch exposes) with the sw counters.
>>
>> Essentially, these pmus just provide access to a big array of counters which
>> don't generate interrupts, and are all 64bit (and assumed to never
>> overflow). Rather than duplicate the code that we already have for managing
>> timing when reading from counters that don't have interrupts (the functions
>> that are exposed by this patch), I've reused it.
>
> So note that all the software counters generate interrupts in their own
> measuring domain. The hrtimer ones measure time and generate time based
> interrupts, the event based ones generate 'interrupts' on their events.
>
> What you have here is a hw pmu without interrupt capability. That's
> fine, they don't get to generate interrupt. We have plenty of those
> already.
>
> But what you propose to do is add interrupt in another domain entirely.
> That's not fine. Don't do that.

Ok, so it looks like I misunderstood the need for an interrupt. The 
intention in using the swevent_hrtimer code was to enable setting up the 
events as frequency sampled. After taking another look at the gpci and 
24x7 pmus, I'm forbidding sampling events anyhow in event init, so the 
timer code isn't even taken advantage of. I'll drop this patch in the 
next set.

>
> You also try and conceal this information; so you suck.
>

^ permalink raw reply

* [PATCH v3 01/11] sysfs: create bin_attributes under the requested group
From: Cody P Schafer @ 2014-02-27 21:04 UTC (permalink / raw)
  To: Linux PPC, Greg Kroah-Hartman
  Cc: Peter Zijlstra, LKML, Michael Ellerman, Ingo Molnar,
	Paul Mackerras, Arnaldo Carvalho de Melo, scottwood,
	Cody P Schafer
In-Reply-To: <1393535105-7528-1-git-send-email-cody@linux.vnet.ibm.com>

bin_attributes created/updated in create_files() (such as those listed
via (struct device).attribute_groups) were not placed under the
specified group, and instead appeared in the base kobj directory.

Fix this by making bin_attributes use creating code similar to normal
attributes.

A quick grep shows that no one is using bin_attrs in a named attribute
group yet, so we can do this without breaking anything in usespace.

Note that I do not add is_visible() support to
bin_attributes, though that could be done as well.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---

No need to merge, already in driver-core-next as
aabaf4c2050d21d39fe11eec889c508e84d6a328, included for
reference/testing/verification only.

---

 fs/sysfs/group.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/fs/sysfs/group.c b/fs/sysfs/group.c
index 6b57938..aa04068 100644
--- a/fs/sysfs/group.c
+++ b/fs/sysfs/group.c
@@ -70,8 +70,11 @@ static int create_files(struct kernfs_node *parent, struct kobject *kobj,
 	if (grp->bin_attrs) {
 		for (bin_attr = grp->bin_attrs; *bin_attr; bin_attr++) {
 			if (update)
-				sysfs_remove_bin_file(kobj, *bin_attr);
-			error = sysfs_create_bin_file(kobj, *bin_attr);
+				kernfs_remove_by_name(parent,
+						(*bin_attr)->attr.name);
+			error = sysfs_add_file_mode_ns(parent,
+					&(*bin_attr)->attr, true,
+					(*bin_attr)->attr.mode, NULL);
 			if (error)
 				break;
 		}
-- 
1.9.0

^ permalink raw reply related

* [PATCH v3 00/11] powerpc: Add support for Power Hypervisor supplied performance counters
From: Cody P Schafer @ 2014-02-27 21:04 UTC (permalink / raw)
  To: Linux PPC
  Cc: Peter Zijlstra, LKML, Michael Ellerman, Ingo Molnar,
	Paul Mackerras, Arnaldo Carvalho de Melo, scottwood,
	Cody P Schafer

These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
performance counters: gpci ("get performance counter info") and 24x7.

The counters supplied by these interfaces are continually counting and never
need to be (and cannot be) disabled or enabled. They additionally do not
generate any interrupts. This makes them in some regards similar to software
counters, and as a result their implimentation shares some common code (which
an initial patch exposes) with the sw counters.

These 2 PMUs end up providing access to some cpu, core, and chip level counters
not exposed via other interfaces, and additionally allow monitoring the
performance of other lpars (guests) on the same host system. Because it
provides access to core and chip level counters, this pair of PMUs could be
thought of as powerpc's counterpart to x86's uncore events.

GPCI is an interface that already exists on some power6 and power7 machines
(depending on the fw version), but is rather in-flexible and code intensive to
add additional counters to.  The 24x7 interfaces currently are designed to
co-exist with the gpci interface while replacing most of gpci's functionality
on newer systems. Right now, the 24x7 code I've submitted uses the gpci calls
to check if it has permission to access certain classes of counters.

--

Since v2:
 - "sysfs: create bin_attributes under the requested group" is now in
	git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core.git driver-core-next
	with commit-id: aabaf4c2050d21d39fe11eec889c508e84d6a328

 - Split hv-24x7.h catalog definition into hv-24x7-catalog.h
 - Remove unused 24x7 and gpci interface structures and enums (Michael Ellerman)
 - Update docs to point to an external source for the full catalog docs
 - Extend some of the patch changelogs (Peter Z)
 - Remove hrtimer usage and just extern the event_idx helper (now renamed) (Peter Z)
 - s/PMU_RANGE_ATTR/PMU_FORMAT_RANGE/ (and similar RESERVED rename) (Michael
   Ellerman)
 - hv_24x7: small clarifications in read_offset_data()'s comment
 - hv_gpci: remove h_gpci_event_read() and h_gpci_event_del(), call _stop and
   _update() directly (Michael Ellerman)
 - Kconfig relocation, dependency changes, and rewording (Scott Wood and
   Michael Ellerman)

Since v1:
 - add a few attributes to hv_gpci and hv_24x7 that expose some info about the interfaces
 - so the attributes show up in the right place, fix bin_attr creation in sysfs groups.
 - move hv_gpci.h and hv_24x7.h interface headers into arch/powerpc/perf
 - fix bit ordering in hv_gpci.h
 - split out hv_perf_caps_get() and use it to probe for the interface before registering
 - ensure proper alignment of hypervisor args
 - add a few missing counter requests to hv_gpci.h
 - s/CIR_xxx/CIR_XXX/ in hv_gpci.h
 - s/modules_init/device_initcall/
 - Don't set event->cpu, use the user provided one
 - remove the union of gpci events, just give the user 1024 bytes to play with
 - clarify some comments (the list of fw versions is now labeled)
 - provide and event_24x7_request() that wraps single_24x7_request()
 - probably some other small fixes I'm forgetting.


Cody P Schafer (11):
  sysfs: create bin_attributes under the requested group
  perf: add PMU_FORMAT_RANGE() helper for use by sw-like pmus
  perf: provide a common perf_event_nop_0() for use with .event_idx
  powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
  powerpc/perf: add hv_gpci interface header
  powerpc/perf: add 24x7 interface headers
  powerpc/perf: add a shared interface to get gpci version and
    capabilities
  powerpc/perf: add support for the hv gpci (get performance counter
    info) interface
  powerpc/perf: add support for the hv 24x7 interface
  powerpc/perf: add kconfig option for hypervisor provided counters
  powerpc/perf/hv_{gpci,24x7}: add documentation of device attributes

 .../testing/sysfs-bus-event_source-devices-hv_24x7 |  23 +
 .../testing/sysfs-bus-event_source-devices-hv_gpci |  43 ++
 arch/powerpc/include/asm/hvcall.h                  |   5 +
 arch/powerpc/perf/Makefile                         |   2 +
 arch/powerpc/perf/hv-24x7-catalog.h                |  33 ++
 arch/powerpc/perf/hv-24x7.c                        | 492 +++++++++++++++++++++
 arch/powerpc/perf/hv-24x7.h                        | 109 +++++
 arch/powerpc/perf/hv-common.c                      |  39 ++
 arch/powerpc/perf/hv-common.h                      |  17 +
 arch/powerpc/perf/hv-gpci.c                        | 277 ++++++++++++
 arch/powerpc/perf/hv-gpci.h                        |  73 +++
 arch/powerpc/platforms/pseries/Kconfig             |  12 +
 fs/sysfs/group.c                                   |   7 +-
 include/linux/perf_event.h                         |  18 +
 kernel/events/core.c                               |  10 +-
 15 files changed, 1153 insertions(+), 7 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
 create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
 create mode 100644 arch/powerpc/perf/hv-24x7-catalog.h
 create mode 100644 arch/powerpc/perf/hv-24x7.c
 create mode 100644 arch/powerpc/perf/hv-24x7.h
 create mode 100644 arch/powerpc/perf/hv-common.c
 create mode 100644 arch/powerpc/perf/hv-common.h
 create mode 100644 arch/powerpc/perf/hv-gpci.c
 create mode 100644 arch/powerpc/perf/hv-gpci.h

-- 
1.9.0

^ permalink raw reply

* [PATCH v3 04/11] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
From: Cody P Schafer @ 2014-02-27 21:04 UTC (permalink / raw)
  To: Linux PPC, Alexander Graf, Anton Blanchard,
	Benjamin Herrenschmidt, Cody P Schafer, Michael Ellerman,
	Paul Mackerras
  Cc: scottwood, Peter Zijlstra, Ingo Molnar, LKML,
	Arnaldo Carvalho de Melo
In-Reply-To: <1393535105-7528-1-git-send-email-cody@linux.vnet.ibm.com>

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index d8b600b..5dbbb29 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -274,6 +274,11 @@
 /* Platform specific hcalls, used by KVM */
 #define H_RTAS			0xf000
 
+/* "Platform specific hcalls", provided by PHYP */
+#define H_GET_24X7_CATALOG_PAGE	0xF078
+#define H_GET_24X7_DATA		0xF07C
+#define H_GET_PERF_COUNTER_INFO	0xF080
+
 #ifndef __ASSEMBLY__
 
 /**
-- 
1.9.0

^ permalink raw reply related

* [PATCH v3 03/11] perf: provide a common perf_event_nop_0() for use with .event_idx
From: Cody P Schafer @ 2014-02-27 21:04 UTC (permalink / raw)
  To: Linux PPC, Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra
  Cc: Peter Zijlstra, LKML, Michael Ellerman, scottwood, Cody P Schafer
In-Reply-To: <1393535105-7528-1-git-send-email-cody@linux.vnet.ibm.com>

Rather an having every pmu that needs a function that just returns 0 for
.event_idx define their own copy, reuse the one in kernel/events/core.c.

Rename from perf_swevent_event_idx() because we're no longer using it
for just software events. Naming is based on the perf_pmu_nop_*()
functions.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 10 +++++-----
 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 3da5081..24a7b45 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -560,6 +560,7 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
 extern u64 perf_event_read_value(struct perf_event *event,
 				 u64 *enabled, u64 *running);
 
+extern int perf_event_nop_0(struct perf_event *event);
 
 struct perf_sample_data {
 	u64				type;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 56003c6..2938a77 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5816,7 +5816,7 @@ static int perf_swevent_init(struct perf_event *event)
 	return 0;
 }
 
-static int perf_swevent_event_idx(struct perf_event *event)
+int perf_event_nop_0(struct perf_event *event)
 {
 	return 0;
 }
@@ -5831,7 +5831,7 @@ static struct pmu perf_swevent = {
 	.stop		= perf_swevent_stop,
 	.read		= perf_swevent_read,
 
-	.event_idx	= perf_swevent_event_idx,
+	.event_idx	= perf_event_nop_0,
 };
 
 #ifdef CONFIG_EVENT_TRACING
@@ -5950,7 +5950,7 @@ static struct pmu perf_tracepoint = {
 	.stop		= perf_swevent_stop,
 	.read		= perf_swevent_read,
 
-	.event_idx	= perf_swevent_event_idx,
+	.event_idx	= perf_event_nop_0,
 };
 
 static inline void perf_tp_register(void)
@@ -6177,7 +6177,7 @@ static struct pmu perf_cpu_clock = {
 	.stop		= cpu_clock_event_stop,
 	.read		= cpu_clock_event_read,
 
-	.event_idx	= perf_swevent_event_idx,
+	.event_idx	= perf_event_nop_0,
 };
 
 /*
@@ -6257,7 +6257,7 @@ static struct pmu perf_task_clock = {
 	.stop		= task_clock_event_stop,
 	.read		= task_clock_event_read,
 
-	.event_idx	= perf_swevent_event_idx,
+	.event_idx	= perf_event_nop_0,
 };
 
 static void perf_pmu_nop_void(struct pmu *pmu)
-- 
1.9.0

^ permalink raw reply related

* [PATCH v3 02/11] perf: add PMU_FORMAT_RANGE() helper for use by sw-like pmus
From: Cody P Schafer @ 2014-02-27 21:04 UTC (permalink / raw)
  To: Linux PPC, Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra
  Cc: Peter Zijlstra, LKML, Michael Ellerman, scottwood, Cody P Schafer
In-Reply-To: <1393535105-7528-1-git-send-email-cody@linux.vnet.ibm.com>

Add PMU_FORMAT_RANGE() and PMU_FORMAT_RANGE_RESERVED() (for reserved
areas) which generate functions to extract the relevent bits from
event->attr.config{,1,2} for use by sw-like pmus where the
'config{,1,2}' values don't map directly to hardware registers.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 include/linux/perf_event.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e56b07f..3da5081 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -871,4 +871,21 @@ _name##_show(struct device *dev,					\
 									\
 static struct device_attribute format_attr_##_name = __ATTR_RO(_name)
 
+#define PMU_FORMAT_RANGE(name, attr_var, bit_start, bit_end)		\
+PMU_FORMAT_ATTR(name, #attr_var ":" #bit_start "-" #bit_end);		\
+PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end)
+
+#define PMU_FORMAT_RANGE_RESERVED(name, attr_var, bit_start, bit_end)	\
+static u64 event_get_##name##_max(void)					\
+{									\
+	int bits = (bit_end) - (bit_start) + 1;				\
+	return ((0x1ULL << (bits - 1ULL)) - 1ULL) |			\
+		(0xFULL << (bits - 4ULL));				\
+}									\
+static u64 event_get_##name(struct perf_event *event)			\
+{									\
+	return (event->attr.attr_var >> (bit_start)) &			\
+		event_get_##name##_max();				\
+}
+
 #endif /* _LINUX_PERF_EVENT_H */
-- 
1.9.0

^ permalink raw reply related

* [PATCH v3 05/11] powerpc/perf: add hv_gpci interface header
From: Cody P Schafer @ 2014-02-27 21:04 UTC (permalink / raw)
  To: Linux PPC, Cody P Schafer
  Cc: Peter Zijlstra, LKML, Michael Ellerman, Ingo Molnar,
	Paul Mackerras, Arnaldo Carvalho de Melo, scottwood
In-Reply-To: <1393535105-7528-1-git-send-email-cody@linux.vnet.ibm.com>

"H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
here on) is an interface to retrieve specific performance counters and
other data from the hypervisor. All outputs have a fixed format. This
header only describes the portions of the interface that we plan on
using in linux at this time.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-gpci.h | 73 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 73 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-gpci.h

diff --git a/arch/powerpc/perf/hv-gpci.h b/arch/powerpc/perf/hv-gpci.h
new file mode 100644
index 0000000..b25f460
--- /dev/null
+++ b/arch/powerpc/perf/hv-gpci.h
@@ -0,0 +1,73 @@
+#ifndef LINUX_POWERPC_PERF_HV_GPCI_H_
+#define LINUX_POWERPC_PERF_HV_GPCI_H_
+
+#include <linux/types.h>
+
+/* From the document "H_GetPerformanceCounterInfo Interface" v1.07 */
+
+/* H_GET_PERF_COUNTER_INFO argument */
+struct hv_get_perf_counter_info_params {
+	__be32 counter_request; /* I */
+	__be32 starting_index;  /* IO */
+	__be16 secondary_index; /* IO */
+	__be16 returned_values; /* O */
+	__be32 detail_rc; /* O, only needed when called via *_norets() */
+
+	/*
+	 * O, size each of counter_value element in bytes, only set for version
+	 * >= 0x3
+	 */
+	__be16 cv_element_size;
+
+	/* I, 0 (zero) for versions < 0x3 */
+	__u8 counter_info_version_in;
+
+	/* O, 0 (zero) if version < 0x3. Must be set to 0 when making hcall */
+	__u8 counter_info_version_out;
+	__u8 reserved[0xC];
+	__u8 counter_value[];
+} __packed;
+
+/*
+ * counter info version => fw version/reference (spec version)
+ *
+ * 8 => power8 (1.07)
+ * [7 is skipped by spec 1.07]
+ * 6 => TLBIE (1.07)
+ * 5 => v7r7m0.phyp (1.05)
+ * [4 skipped]
+ * 3 => v7r6m0.phyp (?)
+ * [1,2 skipped]
+ * 0 => v7r{2,3,4}m0.phyp (?)
+ */
+#define COUNTER_INFO_VERSION_CURRENT 0x8
+
+/*
+ * These determine the counter_value[] layout and the meaning of starting_index
+ * and secondary_index.
+ *
+ * Unless otherwise noted, @secondary_index is unused and ignored.
+ */
+enum counter_info_requests {
+
+	/* GENERAL */
+
+	/* @starting_index: must be -1 (to refer to the current partition)
+	 */
+	CIR_SYSTEM_PERFORMANCE_CAPABILITIES = 0X40,
+};
+
+struct cv_system_performance_capabilities {
+	/* If != 0, allowed to collect data from other partitions */
+	__u8 perf_collect_privileged;
+
+	/* These following are only valid if counter_info_version >= 0x3 */
+#define CV_CM_GA       (1 << 7)
+#define CV_CM_EXPANDED (1 << 6)
+#define CV_CM_LAB      (1 << 5)
+	/* remaining bits are reserved */
+	__u8 capability_mask;
+	__u8 reserved[0xE];
+} __packed;
+
+#endif
-- 
1.9.0

^ permalink raw reply related

* [PATCH v3 06/11] powerpc/perf: add 24x7 interface headers
From: Cody P Schafer @ 2014-02-27 21:04 UTC (permalink / raw)
  To: Linux PPC, Cody P Schafer
  Cc: Peter Zijlstra, LKML, Michael Ellerman, Ingo Molnar,
	Paul Mackerras, Arnaldo Carvalho de Melo, scottwood
In-Reply-To: <1393535105-7528-1-git-send-email-cody@linux.vnet.ibm.com>

24x7 (also called hv_24x7 or H_24X7) is an interface to obtain
performance counters from the hypervisor. These counters do not have a
fixed format/possition and are instead documented in a "24x7 Catalog",
which is provided by the hypervisor (that interface is also documented
paritialy in the included hv-24x7-catalog.h and fully in at
https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h ).

The 24x7 data access is simply a copy operation into a 4 dimentional
array of 64bit counters (from hypervisor to kernel memory). There is no
interupt triggered on overflow, these are completely disjoint from the
typical power pmu.

This method of obtaining performance counters from the hypervisor is
intended to paritialy replace the gpci interface.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-24x7-catalog.h |  33 +++++++++++
 arch/powerpc/perf/hv-24x7.h         | 109 ++++++++++++++++++++++++++++++++++++
 2 files changed, 142 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-24x7-catalog.h
 create mode 100644 arch/powerpc/perf/hv-24x7.h

diff --git a/arch/powerpc/perf/hv-24x7-catalog.h b/arch/powerpc/perf/hv-24x7-catalog.h
new file mode 100644
index 0000000..21b19dd
--- /dev/null
+++ b/arch/powerpc/perf/hv-24x7-catalog.h
@@ -0,0 +1,33 @@
+#ifndef LINUX_POWERPC_PERF_HV_24X7_CATALOG_H_
+#define LINUX_POWERPC_PERF_HV_24X7_CATALOG_H_
+
+#include <linux/types.h>
+
+/* From document "24x7 Event and Group Catalog Formats Proposal" v0.15 */
+
+struct hv_24x7_catalog_page_0 {
+#define HV_24X7_CATALOG_MAGIC 0x32347837 /* "24x7" in ASCII */
+	__be32 magic;
+	__be32 length; /* In 4096 byte pages */
+	__be64 version; /* XXX: arbitrary? what's the meaning/useage/purpose? */
+	__u8 build_time_stamp[16]; /* "YYYYMMDDHHMMSS\0\0" */
+	__u8 reserved2[32];
+	__be16 schema_data_offs; /* in 4096 byte pages */
+	__be16 schema_data_len;  /* in 4096 byte pages */
+	__be16 schema_entry_count;
+	__u8 reserved3[2];
+	__be16 event_data_offs;
+	__be16 event_data_len;
+	__be16 event_entry_count;
+	__u8 reserved4[2];
+	__be16 group_data_offs; /* in 4096 byte pages */
+	__be16 group_data_len;  /* in 4096 byte pages */
+	__be16 group_entry_count;
+	__u8 reserved5[2];
+	__be16 formula_data_offs; /* in 4096 byte pages */
+	__be16 formula_data_len;  /* in 4096 byte pages */
+	__be16 formula_entry_count;
+	__u8 reserved6[2];
+} __packed;
+
+#endif
diff --git a/arch/powerpc/perf/hv-24x7.h b/arch/powerpc/perf/hv-24x7.h
new file mode 100644
index 0000000..720ebce
--- /dev/null
+++ b/arch/powerpc/perf/hv-24x7.h
@@ -0,0 +1,109 @@
+#ifndef LINUX_POWERPC_PERF_HV_24X7_H_
+#define LINUX_POWERPC_PERF_HV_24X7_H_
+
+#include <linux/types.h>
+
+struct hv_24x7_request {
+	/* PHYSICAL domains require enabling via phyp/hmc. */
+#define HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP 0x01
+#define HV_24X7_PERF_DOMAIN_PHYSICAL_CORE 0x02
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CORE   0x03
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CHIP   0x04
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_NODE   0x05
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_REMOTE_NODE 0x06
+	__u8 performance_domain;
+	__u8 reserved[0x1];
+
+	/* bytes to read starting at @data_offset. must be a multiple of 8 */
+	__be16 data_size;
+
+	/*
+	 * byte offset within the perf domain to read from. must be 8 byte
+	 * aligned
+	 */
+	__be32 data_offset;
+
+	/*
+	 * only valid for VIRTUAL_PROCESSOR domains, ignored for others.
+	 * -1 means "current partition only"
+	 *  Enabling via phyp/hmc required for non-"-1" values. 0 forbidden
+	 *  unless requestor is 0.
+	 */
+	__be16 starting_lpar_ix;
+
+	/*
+	 * Ignored when @starting_lpar_ix == -1
+	 * Ignored when @performance_domain is not VIRTUAL_PROCESSOR_*
+	 * -1 means "infinite" or all
+	 */
+	__be16 max_num_lpars;
+
+	/* chip, core, or virtual processor based on @performance_domain */
+	__be16 starting_ix;
+	__be16 max_ix;
+} __packed;
+
+struct hv_24x7_request_buffer {
+	/* 0 - ? */
+	/* 1 - ? */
+#define HV_24X7_IF_VERSION_CURRENT 0x01
+	__u8 interface_version;
+	__u8 num_requests;
+	__u8 reserved[0xE];
+	struct hv_24x7_request requests[];
+} __packed;
+
+struct hv_24x7_result_element {
+	__be16 lpar_ix;
+
+	/*
+	 * represents the core, chip, or virtual processor based on the
+	 * request's @performance_domain
+	 */
+	__be16 domain_ix;
+
+	/* -1 if @performance_domain does not refer to a virtual processor */
+	__be32 lpar_cfg_instance_id;
+
+	/* size = @result_element_data_size of cointaining result. */
+	__u8 element_data[];
+} __packed;
+
+struct hv_24x7_result {
+	__u8 result_ix;
+
+	/*
+	 * 0 = not all result elements fit into the buffer, additional requests
+	 *     required
+	 * 1 = all result elements were returned
+	 */
+	__u8 results_complete;
+	__be16 num_elements_returned;
+
+	/* This is a copy of @data_size from the coresponding hv_24x7_request */
+	__be16 result_element_data_size;
+	__u8 reserved[0x2];
+
+	/* WARNING: only valid for first result element due to variable sizes
+	 *          of result elements */
+	/* struct hv_24x7_result_element[@num_elements_returned] */
+	struct hv_24x7_result_element elements[];
+} __packed;
+
+struct hv_24x7_data_result_buffer {
+	/* See versioning for request buffer */
+	__u8 interface_version;
+
+	__u8 num_results;
+	__u8 reserved[0x1];
+	__u8 failing_request_ix;
+	__be32 detailed_rc;
+	__be64 cec_cfg_instance_id;
+	__be64 catalog_version_num;
+	__u8 reserved2[0x8];
+	/* WARNING: only valid for the first result due to variable sizes of
+	 *	    results */
+	struct hv_24x7_result results[]; /* [@num_results] */
+} __packed;
+
+#endif
-- 
1.9.0

^ permalink raw reply related

* [PATCH v3 07/11] powerpc/perf: add a shared interface to get gpci version and capabilities
From: Cody P Schafer @ 2014-02-27 21:05 UTC (permalink / raw)
  To: Linux PPC, Cody P Schafer
  Cc: Peter Zijlstra, LKML, Michael Ellerman, Ingo Molnar,
	Paul Mackerras, Arnaldo Carvalho de Melo, scottwood
In-Reply-To: <1393535105-7528-1-git-send-email-cody@linux.vnet.ibm.com>

This exposes a simple way to grab the firmware provided
collect_priveliged, ga, expanded, and lab capability bits. All of these
bits come in from the same gpci request, so we've exposed all of them.

Only the collect_priveliged bit is really used by the hv-gpci/hv-24x7
code, the other bits are simply exposed in sysfs to inform the user.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-common.c | 39 +++++++++++++++++++++++++++++++++++++++
 arch/powerpc/perf/hv-common.h | 17 +++++++++++++++++
 2 files changed, 56 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-common.c
 create mode 100644 arch/powerpc/perf/hv-common.h

diff --git a/arch/powerpc/perf/hv-common.c b/arch/powerpc/perf/hv-common.c
new file mode 100644
index 0000000..47e02b3
--- /dev/null
+++ b/arch/powerpc/perf/hv-common.c
@@ -0,0 +1,39 @@
+#include <asm/io.h>
+#include <asm/hvcall.h>
+
+#include "hv-gpci.h"
+#include "hv-common.h"
+
+unsigned long hv_perf_caps_get(struct hv_perf_caps *caps)
+{
+	unsigned long r;
+	struct p {
+		struct hv_get_perf_counter_info_params params;
+		struct cv_system_performance_capabilities caps;
+	} __packed __aligned(sizeof(uint64_t));
+
+	struct p arg = {
+		.params = {
+			.counter_request = cpu_to_be32(
+					CIR_SYSTEM_PERFORMANCE_CAPABILITIES),
+			.starting_index = cpu_to_be32(-1),
+			.counter_info_version_in = 0,
+		}
+	};
+
+	r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+			       virt_to_phys(&arg), sizeof(arg));
+
+	if (r)
+		return r;
+
+	pr_devel("capability_mask: 0x%x\n", arg.caps.capability_mask);
+
+	caps->version = arg.params.counter_info_version_out;
+	caps->collect_privileged = !!arg.caps.perf_collect_privileged;
+	caps->ga = !!(arg.caps.capability_mask & CV_CM_GA);
+	caps->expanded = !!(arg.caps.capability_mask & CV_CM_EXPANDED);
+	caps->lab = !!(arg.caps.capability_mask & CV_CM_LAB);
+
+	return r;
+}
diff --git a/arch/powerpc/perf/hv-common.h b/arch/powerpc/perf/hv-common.h
new file mode 100644
index 0000000..7e615bd
--- /dev/null
+++ b/arch/powerpc/perf/hv-common.h
@@ -0,0 +1,17 @@
+#ifndef LINUX_POWERPC_PERF_HV_COMMON_H_
+#define LINUX_POWERPC_PERF_HV_COMMON_H_
+
+#include <linux/types.h>
+
+struct hv_perf_caps {
+	u16 version;
+	u16 collect_privileged:1,
+	    ga:1,
+	    expanded:1,
+	    lab:1,
+	    unused:12;
+};
+
+unsigned long hv_perf_caps_get(struct hv_perf_caps *caps);
+
+#endif
-- 
1.9.0

^ permalink raw reply related

* [PATCH v3 08/11] powerpc/perf: add support for the hv gpci (get performance counter info) interface
From: Cody P Schafer @ 2014-02-27 21:05 UTC (permalink / raw)
  To: Linux PPC, Cody P Schafer
  Cc: Peter Zijlstra, LKML, Michael Ellerman, Ingo Molnar,
	Paul Mackerras, Arnaldo Carvalho de Melo, scottwood
In-Reply-To: <1393535105-7528-1-git-send-email-cody@linux.vnet.ibm.com>

This provides a basic link between perf and hv_gpci. Notably, it does
not yet support transactions and does not list any events (they can
still be manually composed).

Example usage via perf tool:

	perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0xffffffff,request=0x10/' -r 0 -C 0 -x ' ' sleep 0.1

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-gpci.c | 277 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 277 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-gpci.c

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
new file mode 100644
index 0000000..2f64732
--- /dev/null
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -0,0 +1,277 @@
+/*
+ * Hypervisor supplied "gpci" ("get performance counter info") performance
+ * counter support
+ *
+ * Author: Cody P Schafer <cody@linux.vnet.ibm.com>
+ * Copyright 2014 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#define pr_fmt(fmt) "hv-gpci: " fmt
+
+#include <linux/init.h>
+#include <linux/perf_event.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/io.h>
+
+#include "hv-gpci.h"
+#include "hv-common.h"
+
+PMU_FORMAT_RANGE(request, config, 0, 31); /* u32 */
+PMU_FORMAT_RANGE(starting_index, config, 32, 63); /* u32 */
+PMU_FORMAT_RANGE(secondary_index, config1, 0, 15); /* u16 */
+PMU_FORMAT_RANGE(counter_info_version, config1, 16, 23); /* u8 */
+PMU_FORMAT_RANGE(length, config1, 24, 31); /* u8, bytes of data (1-8) */
+PMU_FORMAT_RANGE(offset, config1, 32, 63); /* u32, byte offset */
+
+static struct attribute *format_attrs[] = {
+	&format_attr_request.attr,
+	&format_attr_starting_index.attr,
+	&format_attr_secondary_index.attr,
+	&format_attr_counter_info_version.attr,
+
+	&format_attr_offset.attr,
+	&format_attr_length.attr,
+	NULL,
+};
+
+static struct attribute_group format_group = {
+	.name = "format",
+	.attrs = format_attrs,
+};
+
+#define HV_CAPS_ATTR(_name, _format)				\
+static ssize_t _name##_show(struct device *dev,			\
+			    struct device_attribute *attr,	\
+			    char *page)				\
+{								\
+	struct hv_perf_caps caps;				\
+	unsigned long hret = hv_perf_caps_get(&caps);		\
+	if (hret)						\
+		return -EIO;					\
+								\
+	return sprintf(page, _format, caps._name);		\
+}								\
+static struct device_attribute hv_caps_attr_##_name = __ATTR_RO(_name)
+
+static ssize_t kernel_version_show(struct device *dev,
+				   struct device_attribute *attr,
+				   char *page)
+{
+	return sprintf(page, "0x%x\n", COUNTER_INFO_VERSION_CURRENT);
+}
+
+DEVICE_ATTR_RO(kernel_version);
+HV_CAPS_ATTR(version, "0x%x\n");
+HV_CAPS_ATTR(ga, "%d\n");
+HV_CAPS_ATTR(expanded, "%d\n");
+HV_CAPS_ATTR(lab, "%d\n");
+HV_CAPS_ATTR(collect_privileged, "%d\n");
+
+static struct attribute *interface_attrs[] = {
+	&dev_attr_kernel_version.attr,
+	&hv_caps_attr_version.attr,
+	&hv_caps_attr_ga.attr,
+	&hv_caps_attr_expanded.attr,
+	&hv_caps_attr_lab.attr,
+	&hv_caps_attr_collect_privileged.attr,
+	NULL,
+};
+
+static struct attribute_group interface_group = {
+	.name = "interface",
+	.attrs = interface_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+	&format_group,
+	&interface_group,
+	NULL,
+};
+
+#define GPCI_MAX_DATA_BYTES \
+	(1024 - sizeof(struct hv_get_perf_counter_info_params))
+
+static unsigned long single_gpci_request(u32 req, u32 starting_index,
+		u16 secondary_index, u8 version_in, u32 offset, u8 length,
+		u64 *value)
+{
+	unsigned long ret;
+	size_t i;
+	u64 count;
+
+	struct {
+		struct hv_get_perf_counter_info_params params;
+		uint8_t bytes[GPCI_MAX_DATA_BYTES];
+	} __packed __aligned(sizeof(uint64_t)) arg = {
+		.params = {
+			.counter_request = cpu_to_be32(req),
+			.starting_index = cpu_to_be32(starting_index),
+			.secondary_index = cpu_to_be16(secondary_index),
+			.counter_info_version_in = version_in,
+		}
+	};
+
+	ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+			virt_to_phys(&arg), sizeof(arg));
+	if (ret) {
+		pr_devel("hcall failed: 0x%lx\n", ret);
+		return ret;
+	}
+
+	/*
+	 * we verify offset and length are within the zeroed buffer at event
+	 * init.
+	 */
+	count = 0;
+	for (i = offset; i < offset + length; i++)
+		count |= arg.bytes[i] << (i - offset);
+
+	*value = count;
+	return ret;
+}
+
+static u64 h_gpci_get_value(struct perf_event *event)
+{
+	u64 count;
+	unsigned long ret = single_gpci_request(event_get_request(event),
+					event_get_starting_index(event),
+					event_get_secondary_index(event),
+					event_get_counter_info_version(event),
+					event_get_offset(event),
+					event_get_length(event),
+					&count);
+	if (ret)
+		return 0;
+	return count;
+}
+
+static void h_gpci_event_update(struct perf_event *event)
+{
+	s64 prev;
+	u64 now = h_gpci_get_value(event);
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void h_gpci_event_start(struct perf_event *event, int flags)
+{
+	local64_set(&event->hw.prev_count, h_gpci_get_value(event));
+}
+
+static void h_gpci_event_stop(struct perf_event *event, int flags)
+{
+	h_gpci_event_update(event);
+}
+
+static int h_gpci_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		h_gpci_event_start(event, flags);
+
+	return 0;
+}
+
+static int h_gpci_event_init(struct perf_event *event)
+{
+	u64 count;
+	u8 length;
+
+	/* Not our event */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* config2 is unused */
+	if (event->attr.config2) {
+		pr_devel("config2 set when reserved\n");
+		return -EINVAL;
+	}
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest  ||
+	    is_sampling_event(event)) /* no sampling */
+		return -EINVAL;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	length = event_get_length(event);
+	if (length < 1 || length > 8) {
+		pr_devel("length invalid\n");
+		return -EINVAL;
+	}
+
+	/* last byte within the buffer? */
+	if ((event_get_offset(event) + length) > GPCI_MAX_DATA_BYTES) {
+		pr_devel("request outside of buffer: %zu > %zu\n",
+				(size_t)event_get_offset(event) + length,
+				GPCI_MAX_DATA_BYTES);
+		return -EINVAL;
+	}
+
+	/* check if the request works... */
+	if (single_gpci_request(event_get_request(event),
+				event_get_starting_index(event),
+				event_get_secondary_index(event),
+				event_get_counter_info_version(event),
+				event_get_offset(event),
+				length,
+				&count)) {
+		pr_devel("gpci hcall failed\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static struct pmu h_gpci_pmu = {
+	.task_ctx_nr = perf_invalid_context,
+
+	.name = "hv_gpci",
+	.attr_groups = attr_groups,
+	.event_init  = h_gpci_event_init,
+	.add         = h_gpci_event_add,
+	.del         = h_gpci_event_stop,
+	.start       = h_gpci_event_start,
+	.stop        = h_gpci_event_stop,
+	.read        = h_gpci_event_update,
+
+	.event_idx = perf_event_nop_0,
+};
+
+static int hv_gpci_init(void)
+{
+	int r;
+	unsigned long hret;
+	struct hv_perf_caps caps;
+
+	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+		pr_info("not a virtualized system, not enabling\n");
+		return -ENODEV;
+	}
+
+	hret = hv_perf_caps_get(&caps);
+	if (hret) {
+		pr_info("could not obtain capabilities, error 0x%80lx, not enabling\n",
+				hret);
+		return -ENODEV;
+	}
+
+	r = perf_pmu_register(&h_gpci_pmu, h_gpci_pmu.name, -1);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+device_initcall(hv_gpci_init);
-- 
1.9.0

^ permalink raw reply related

* [PATCH v3 09/11] powerpc/perf: add support for the hv 24x7 interface
From: Cody P Schafer @ 2014-02-27 21:05 UTC (permalink / raw)
  To: Linux PPC, Cody P Schafer
  Cc: Peter Zijlstra, LKML, Michael Ellerman, Ingo Molnar,
	Paul Mackerras, Arnaldo Carvalho de Melo, scottwood
In-Reply-To: <1393535105-7528-1-git-send-email-cody@linux.vnet.ibm.com>

This provides a basic interface between hv_24x7 and perf. Similar to
the one provided for gpci, it lacks transaction support and does not
list any events.

Example usage via perf tool:

	perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0xffffffff/' -r 0 -C 0 -x ' ' sleep 0.1

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-24x7.c | 492 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 492 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-24x7.c

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
new file mode 100644
index 0000000..c1847a3
--- /dev/null
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -0,0 +1,492 @@
+/*
+ * Hypervisor supplied "24x7" performance counter support
+ *
+ * Author: Cody P Schafer <cody@linux.vnet.ibm.com>
+ * Copyright 2014 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "hv-24x7: " fmt
+
+#include <linux/perf_event.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/io.h>
+
+#include "hv-24x7.h"
+#include "hv-24x7-catalog.h"
+#include "hv-common.h"
+
+/*
+ * TODO: Merging events:
+ * - Think of the hcall as an interface to a 4d array of counters:
+ *   - x = domains
+ *   - y = indexes in the domain (core, chip, vcpu, node, etc)
+ *   - z = offset into the counter space
+ *   - w = lpars (guest vms, "logical partitions")
+ * - A single request is: x,y,y_last,z,z_last,w,w_last
+ *   - this means we can retrieve a rectangle of counters in y,z for a single x.
+ *
+ * - Things to consider (ignoring w):
+ *   - input  cost_per_request = 16
+ *   - output cost_per_result(ys,zs)  = 8 + 8 * ys + ys * zs
+ *   - limited number of requests per hcall (must fit into 4K bytes)
+ *     - 4k = 16 [buffer header] - 16 [request size] * request_count
+ *     - 255 requests per hcall
+ *   - sometimes it will be more efficient to read extra data and discard
+ */
+
+PMU_FORMAT_RANGE(domain, config, 0, 3); /* u3 0-6, one of HV_24X7_PERF_DOMAIN */
+PMU_FORMAT_RANGE(starting_index, config, 16, 31); /* u16 */
+PMU_FORMAT_RANGE(offset, config, 32, 63); /* u32, see "data_offset" */
+PMU_FORMAT_RANGE(lpar, config1, 0, 15); /* u16 */
+
+PMU_FORMAT_RANGE_RESERVED(reserved1, config,   4, 15);
+PMU_FORMAT_RANGE_RESERVED(reserved2, config1, 16, 63);
+PMU_FORMAT_RANGE_RESERVED(reserved3, config2,  0, 63);
+
+static struct attribute *format_attrs[] = {
+	&format_attr_domain.attr,
+	&format_attr_offset.attr,
+	&format_attr_starting_index.attr,
+	&format_attr_lpar.attr,
+	NULL,
+};
+
+static struct attribute_group format_group = {
+	.name = "format",
+	.attrs = format_attrs,
+};
+
+/*
+ * read_offset_data - copy data from one buffer to another while treating the
+ *                    source buffer as a small view on the total avaliable
+ *                    source data.
+ *
+ * @dest: buffer to copy into
+ * @dest_len: length of @dest in bytes
+ * @requested_offset: the offset within the source data we want. Must be > 0
+ * @src: buffer to copy data from
+ * @src_len: length of @src in bytes
+ * @source_offset: the offset in the sorce data that (src,src_len) refers to.
+ *                 Must be > 0
+ *
+ * returns the number of bytes copied.
+ *
+ * The following ascii art shows the various buffer possitioning we need to
+ * handle, assigns some arbitrary varibles to points on the buffer, and then
+ * shows how we fiddle with those values to get things we care about (copy
+ * start in src and copy len)
+ *
+ * s = @src buffer
+ * d = @dest buffer
+ * '.' areas in d are written to.
+ *
+ *                       u
+ *   x         w	 v  z
+ * d           |.........|
+ * s |----------------------|
+ *
+ *                      u
+ *   x         w	z     v
+ * d           |........------|
+ * s |------------------|
+ *
+ *   x         w        u,z,v
+ * d           |........|
+ * s |------------------|
+ *
+ *   x,w                u,v,z
+ * d |..................|
+ * s |------------------|
+ *
+ *   x        u
+ *   w        v		z
+ * d |........|
+ * s |------------------|
+ *
+ *   x      z   w      v
+ * d            |------|
+ * s |------|
+ *
+ * x = source_offset
+ * w = requested_offset
+ * z = source_offset + src_len
+ * v = requested_offset + dest_len
+ *
+ * w_offset_in_s = w - x = requested_offset - source_offset
+ * z_offset_in_s = z - x = src_len
+ * v_offset_in_s = v - x = request_offset + dest_len - src_len
+ */
+static ssize_t read_offset_data(void *dest, size_t dest_len,
+				loff_t requested_offset, void *src,
+				size_t src_len, loff_t source_offset)
+{
+	size_t w_offset_in_s = requested_offset - source_offset;
+	size_t z_offset_in_s = src_len;
+	size_t v_offset_in_s = requested_offset + dest_len - src_len;
+	size_t u_offset_in_s = min(z_offset_in_s, v_offset_in_s);
+	size_t copy_len = u_offset_in_s - w_offset_in_s;
+
+	if (requested_offset < 0 || source_offset < 0)
+		return -EINVAL;
+
+	if (z_offset_in_s <= w_offset_in_s)
+		return 0;
+
+	memcpy(dest, src + w_offset_in_s, copy_len);
+	return copy_len;
+}
+
+static unsigned long h_get_24x7_catalog_page(char page[static 4096],
+					     u32 version, u32 index)
+{
+	WARN_ON(!IS_ALIGNED((unsigned long)page, 4096));
+	return plpar_hcall_norets(H_GET_24X7_CATALOG_PAGE,
+			virt_to_phys(page),
+			version,
+			index);
+}
+
+static ssize_t catalog_read(struct file *filp, struct kobject *kobj,
+			    struct bin_attribute *bin_attr, char *buf,
+			    loff_t offset, size_t count)
+{
+	unsigned long hret;
+	ssize_t ret = 0;
+	size_t catalog_len = 0, catalog_page_len = 0, page_count = 0;
+	loff_t page_offset = 0;
+	uint32_t catalog_version_num = 0;
+	void *page = kmalloc(4096, GFP_USER);
+	struct hv_24x7_catalog_page_0 *page_0 = page;
+	if (!page)
+		return -ENOMEM;
+
+
+	hret = h_get_24x7_catalog_page(page, 0, 0);
+	if (hret) {
+		ret = -EIO;
+		goto e_free;
+	}
+
+	catalog_version_num = be32_to_cpu(page_0->version);
+	catalog_page_len = be32_to_cpu(page_0->length);
+	catalog_len = catalog_page_len * 4096;
+
+	page_offset = offset / 4096;
+	page_count  = count  / 4096;
+
+	if (page_offset >= catalog_page_len)
+		goto e_free;
+
+	if (page_offset != 0) {
+		hret = h_get_24x7_catalog_page(page, catalog_version_num,
+					       page_offset);
+		if (hret) {
+			ret = -EIO;
+			goto e_free;
+		}
+	}
+
+	ret = read_offset_data(buf, count, offset,
+				page, 4096, page_offset * 4096);
+e_free:
+	if (hret)
+		pr_err("h_get_24x7_catalog_page(ver=%d, page=%lld) failed: rc=%ld\n",
+				catalog_version_num, page_offset, hret);
+	kfree(page);
+
+	pr_devel("catalog_read: offset=%lld(%lld) count=%zu(%zu) catalog_len=%zu(%zu) => %zd\n",
+			offset, page_offset, count, page_count, catalog_len,
+			catalog_page_len, ret);
+
+	return ret;
+}
+
+#define PAGE_0_ATTR(_name, _fmt, _expr)				\
+static ssize_t _name##_show(struct device *dev,			\
+			    struct device_attribute *dev_attr,	\
+			    char *buf)				\
+{								\
+	unsigned long hret;					\
+	ssize_t ret = 0;					\
+	void *page = kmalloc(4096, GFP_USER);			\
+	struct hv_24x7_catalog_page_0 *page_0 = page;		\
+	if (!page)						\
+		return -ENOMEM;					\
+	hret = h_get_24x7_catalog_page(page, 0, 0);		\
+	if (hret) {						\
+		ret = -EIO;					\
+		goto e_free;					\
+	}							\
+	ret = sprintf(buf, _fmt, _expr);			\
+e_free:								\
+	kfree(page);						\
+	return ret;						\
+}								\
+static DEVICE_ATTR_RO(_name)
+
+PAGE_0_ATTR(catalog_version, "%lld\n",
+		(unsigned long long)be32_to_cpu(page_0->version));
+PAGE_0_ATTR(catalog_len, "%lld\n",
+		(unsigned long long)be32_to_cpu(page_0->length) * 4096);
+static BIN_ATTR_RO(catalog, 0/* real length varies */);
+
+static struct bin_attribute *if_bin_attrs[] = {
+	&bin_attr_catalog,
+	NULL,
+};
+
+static struct attribute *if_attrs[] = {
+	&dev_attr_catalog_len.attr,
+	&dev_attr_catalog_version.attr,
+	NULL,
+};
+
+static struct attribute_group if_group = {
+	.name = "interface",
+	.bin_attrs = if_bin_attrs,
+	.attrs = if_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+	&format_group,
+	&if_group,
+	NULL,
+};
+
+static bool is_physical_domain(int domain)
+{
+	return  domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP ||
+		domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CORE;
+}
+
+static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix,
+					 u16 lpar, u64 *res,
+					 bool success_expected)
+{
+	unsigned long ret;
+
+	/*
+	 * request_buffer and result_buffer are not required to be 4k aligned,
+	 * but are not allowed to cross any 4k boundary. Aligning them to 4k is
+	 * the simplest way to ensure that.
+	 */
+	struct reqb {
+		struct hv_24x7_request_buffer buf;
+		struct hv_24x7_request req;
+	} __packed __aligned(4096) request_buffer = {
+		.buf = {
+			.interface_version = HV_24X7_IF_VERSION_CURRENT,
+			.num_requests = 1,
+		},
+		.req = {
+			.performance_domain = domain,
+			.data_size = cpu_to_be16(8),
+			.data_offset = cpu_to_be32(offset),
+			.starting_lpar_ix = cpu_to_be16(lpar),
+			.max_num_lpars = cpu_to_be16(1),
+			.starting_ix = cpu_to_be16(ix),
+			.max_ix = cpu_to_be16(1),
+		}
+	};
+
+	struct resb {
+		struct hv_24x7_data_result_buffer buf;
+		struct hv_24x7_result res;
+		struct hv_24x7_result_element elem;
+		__be64 result;
+	} __packed __aligned(4096) result_buffer = {};
+
+	ret = plpar_hcall_norets(H_GET_24X7_DATA,
+			virt_to_phys(&request_buffer), sizeof(request_buffer),
+			virt_to_phys(&result_buffer),  sizeof(result_buffer));
+
+	if (ret) {
+		if (success_expected)
+			pr_err_ratelimited("hcall failed: %d %#x %#x %d => 0x%lx (%ld) detail=0x%x failing ix=%x\n",
+					domain, offset, ix, lpar,
+					ret, ret,
+					result_buffer.buf.detailed_rc,
+					result_buffer.buf.failing_request_ix);
+		return ret;
+	}
+
+	*res = be64_to_cpu(result_buffer.result);
+	return ret;
+}
+
+static unsigned long event_24x7_request(struct perf_event *event, u64 *res,
+		bool success_expected)
+{
+	return single_24x7_request(event_get_domain(event),
+				event_get_offset(event),
+				event_get_starting_index(event),
+				event_get_lpar(event),
+				res,
+				success_expected);
+}
+
+static int h_24x7_event_init(struct perf_event *event)
+{
+	struct hv_perf_caps caps;
+	unsigned domain;
+	unsigned long hret;
+	u64 ct;
+
+	/* Not our event */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* Unused areas must be 0 */
+	if (event_get_reserved1(event) ||
+	    event_get_reserved2(event) ||
+	    event_get_reserved3(event)) {
+		pr_devel("reserved set when forbidden 0x%llx(0x%llx) 0x%llx(0x%llx) 0x%llx(0x%llx)\n",
+				event->attr.config,
+				event_get_reserved1(event),
+				event->attr.config1,
+				event_get_reserved2(event),
+				event->attr.config2,
+				event_get_reserved3(event));
+		return -EINVAL;
+	}
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest  ||
+	    is_sampling_event(event)) /* no sampling */
+		return -EINVAL;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	/* offset must be 8 byte aligned */
+	if (event_get_offset(event) % 8) {
+		pr_devel("bad alignment\n");
+		return -EINVAL;
+	}
+
+	/* Domains above 6 are invalid */
+	domain = event_get_domain(event);
+	if (domain > 6) {
+		pr_devel("invalid domain %d\n", domain);
+		return -EINVAL;
+	}
+
+	hret = hv_perf_caps_get(&caps);
+	if (hret) {
+		pr_devel("could not get capabilities: rc=%ld\n", hret);
+		return -EIO;
+	}
+
+	/* PHYSICAL domains & other lpars require extra capabilities */
+	if (!caps.collect_privileged && (is_physical_domain(domain) ||
+		(event_get_lpar(event) != event_get_lpar_max()))) {
+		pr_devel("hv permisions disallow: is_physical_domain:%d, lpar=0x%llx\n",
+				is_physical_domain(domain),
+				event_get_lpar(event));
+		return -EACCES;
+	}
+
+	/* see if the event complains */
+	if (event_24x7_request(event, &ct, false)) {
+		pr_devel("test hcall failed\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static u64 h_24x7_get_value(struct perf_event *event)
+{
+	unsigned long ret;
+	u64 ct;
+	ret = event_24x7_request(event, &ct, true);
+	if (ret)
+		/* We checked this in event init, shouldn't fail here... */
+		return 0;
+
+	return ct;
+}
+
+static void h_24x7_event_update(struct perf_event *event)
+{
+	s64 prev;
+	u64 now;
+	now = h_24x7_get_value(event);
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void h_24x7_event_start(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_RELOAD)
+		local64_set(&event->hw.prev_count, h_24x7_get_value(event));
+}
+
+static void h_24x7_event_stop(struct perf_event *event, int flags)
+{
+	h_24x7_event_update(event);
+}
+
+static int h_24x7_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		h_24x7_event_start(event, flags);
+
+	return 0;
+}
+
+static struct pmu h_24x7_pmu = {
+	.task_ctx_nr = perf_invalid_context,
+
+	.name = "hv_24x7",
+	.attr_groups = attr_groups,
+	.event_init  = h_24x7_event_init,
+	.add         = h_24x7_event_add,
+	.del         = h_24x7_event_stop,
+	.start       = h_24x7_event_start,
+	.stop        = h_24x7_event_stop,
+	.read        = h_24x7_event_update,
+
+	.event_idx = perf_event_nop_0,
+};
+
+static int hv_24x7_init(void)
+{
+	int r;
+	unsigned long hret;
+	struct hv_perf_caps caps;
+
+	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+		pr_info("not a virtualized system, not enabling\n");
+		return -ENODEV;
+	}
+
+	hret = hv_perf_caps_get(&caps);
+	if (hret) {
+		pr_info("could not obtain capabilities, error 0x%80lx, not enabling\n",
+				hret);
+		return -ENODEV;
+	}
+
+	r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+device_initcall(hv_24x7_init);
-- 
1.9.0

^ permalink raw reply related

* [PATCH v3 10/11] powerpc/perf: add kconfig option for hypervisor provided counters
From: Cody P Schafer @ 2014-02-27 21:05 UTC (permalink / raw)
  To: Linux PPC, Anshuman Khandual, Benjamin Herrenschmidt,
	Cody P Schafer, Deepthi Dharwar, Gavin Shan, Lijun Pan, Li Zhong,
	Michael Ellerman, Paul Bolle, Priyanka Jain, Srivatsa S. Bhat
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, scottwood
In-Reply-To: <1393535105-7528-1-git-send-email-cody@linux.vnet.ibm.com>

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/Makefile             |  2 ++
 arch/powerpc/platforms/pseries/Kconfig | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 60d71ee..f9c083a 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS)	+= mpc7450-pmu.o
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
 obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
 
+obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
+
 obj-$(CONFIG_PPC64)		+= $(obj64-y)
 obj-$(CONFIG_PPC32)		+= $(obj32-y)
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 80b1d57..2cb8b77 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -111,6 +111,18 @@ config CMM
 	  will be reused for other LPARs. The interface allows firmware to
 	  balance memory across many LPARs.
 
+config HV_PERF_CTRS
+       bool "Hypervisor supplied PMU events (24x7 & GPCI)"
+       default y
+       depends on PERF_EVENTS && PPC_PSERIES
+       help
+	  Enable access to hypervisor supplied counters in perf. Currently,
+	  this enables code that uses the hcall GetPerfCounterInfo and 24x7
+	  interfaces to retrieve counters. GPCI exists on Power 6 and later
+	  systems. 24x7 is available on Power 8 systems.
+
+          If unsure, select Y.
+
 config DTL
 	bool "Dispatch Trace Log"
 	depends on PPC_SPLPAR && DEBUG_FS
-- 
1.9.0

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox