LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] powerpc/defconfigs: Enable THP in pseries defconfig
From: Aneesh Kumar K.V @ 2014-03-15 11:29 UTC (permalink / raw)
  To: benh, paulus; +Cc: linuxppc-dev, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We also set it to be enabled always. This helps in wider testing

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/configs/pseries_defconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/powerpc/configs/pseries_defconfig b/arch/powerpc/configs/pseries_defconfig
index e9a8b4e0a0f6..82cf3f8721b9 100644
--- a/arch/powerpc/configs/pseries_defconfig
+++ b/arch/powerpc/configs/pseries_defconfig
@@ -353,3 +353,5 @@ CONFIG_CRYPTO_DEV_NX_ENCRYPT=m
 CONFIG_VIRTUALIZATION=y
 CONFIG_KVM_BOOK3S_64=m
 CONFIG_KVM_BOOK3S_64_HV=y
+CONFIG_TRANSPARENT_HUGEPAGE=y
+CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH] powerpc/mm: Make sure a local_irq_disable prevent a parallel THP split
From: Aneesh Kumar K.V @ 2014-03-15 10:47 UTC (permalink / raw)
  To: benh, paulus, Rik van Riel
  Cc: linux-mm, linuxppc-dev, linux-kernel, Aneesh Kumar K.V

From: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>

We have generic code like the one in get_futex_key that assume that
a local_irq_disable prevents a parallel THP split. Support that by
adding a dummy smp call function after setting _PAGE_SPLITTING. Code
paths like get_user_pages_fast still need to check for _PAGE_SPLITTING
after disabling IRQ which indicate that a parallel THP splitting is
ongoing. Now if they don't find _PAGE_SPLITTING set, then we can be
sure that parallel split will now block in pmdp_splitting flush
until we enables IRQ

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
---
 arch/powerpc/mm/pgtable_64.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/powerpc/mm/pgtable_64.c b/arch/powerpc/mm/pgtable_64.c
index 62bf5e8e78da..f6ce1f111f5b 100644
--- a/arch/powerpc/mm/pgtable_64.c
+++ b/arch/powerpc/mm/pgtable_64.c
@@ -647,6 +647,11 @@ void pmdp_splitting_flush(struct vm_area_struct *vma,
 		if (old & _PAGE_HASHPTE)
 			hpte_do_hugepage_flush(vma->vm_mm, address, pmdp);
 	}
+	/*
+	 * This ensures that generic code that rely on IRQ disabling
+	 * to prevent a parallel THP split work as expected.
+	 */
+	kick_all_cpus_sync();
 }

 /*
-- 
1.8.3.2

^ permalink raw reply related

* Re: [PATCH 9/9] powerpc/pm: support deep sleep feature on T1040
From: Scott Wood @ 2014-03-14 23:18 UTC (permalink / raw)
  To: Chenhui Zhao; +Cc: linuxppc-dev, linux-kernel, Jason.Jin
In-Reply-To: <20140312104005.GH4706@localhost.localdomain>

On Wed, 2014-03-12 at 18:40 +0800, Chenhui Zhao wrote:
> On Tue, Mar 11, 2014 at 08:10:24PM -0500, Scott Wood wrote:
> > On Fri, 2014-03-07 at 12:58 +0800, Chenhui Zhao wrote:
> > > From: Zhao Chenhui <chenhui.zhao@freescale.com>
> > > 
> > > T1040 supports deep sleep feature, which can switch off most parts of
> > > the SoC when it is in deep sleep mode. This way, it becomes more
> > > energy-efficient.
> > > 
> > > The DDR controller will also be powered off in deep sleep. Therefore,
> > > the last stage (the latter part of fsl_dp_enter_low) will run without DDR
> > > access. This piece of code and related TLBs will be prefetched.
> > > 
> > > Due to the different initialization code between 32-bit and 64-bit, they
> > > have seperate resume entry and precedure.
> > > 
> > > The feature supports 32-bit and 64-bit kernel mode.
> > > 
> > > Signed-off-by: Zhao Chenhui <chenhui.zhao@freescale.com>
> > > ---
> > >  arch/powerpc/include/asm/booke_save_regs.h |    3 +
> > >  arch/powerpc/kernel/cpu_setup_fsl_booke.S  |   17 ++
> > >  arch/powerpc/kernel/head_fsl_booke.S       |   30 +++
> > >  arch/powerpc/platforms/85xx/Makefile       |    2 +-
> > >  arch/powerpc/platforms/85xx/deepsleep.c    |  201 +++++++++++++++++++
> > >  arch/powerpc/platforms/85xx/qoriq_pm.c     |   38 ++++
> > >  arch/powerpc/platforms/85xx/sleep.S        |  295 ++++++++++++++++++++++++++++
> > >  arch/powerpc/sysdev/fsl_soc.h              |    7 +
> > >  8 files changed, 592 insertions(+), 1 deletions(-)
> > >  create mode 100644 arch/powerpc/platforms/85xx/deepsleep.c
> > >  create mode 100644 arch/powerpc/platforms/85xx/sleep.S
> > > 
> > > diff --git a/arch/powerpc/include/asm/booke_save_regs.h b/arch/powerpc/include/asm/booke_save_regs.h
> > > index 87c357a..37c1f6c 100644
> > > --- a/arch/powerpc/include/asm/booke_save_regs.h
> > > +++ b/arch/powerpc/include/asm/booke_save_regs.h
> > > @@ -88,6 +88,9 @@
> > >  #define HIBERNATION_FLAG	1
> > >  #define DEEPSLEEP_FLAG		2
> > >  
> > > +#define CPLD_FLAG	1
> > > +#define FPGA_FLAG	2
> > 
> > What is this?
> 
> We have two kind of boards, QDS and RDB.
> They have different register map. Use the flag to indicate the current board is using which kind
> of register map.

CPLD versus FPGA is not a meaningful difference.  We don't care what
technology is used to implement programmable logic -- we care what
programming interface is exposed.  Customers will have their own boards
that will likely not imitate either of these programming interfaces, but
what they do have will still probably be implemented in a CPLD or FPGA.
Likewise, Freescale may have future reference boards whose CPLD/FPGA is
not compatible.

So use better naming, and structure the code so it's easy to plug in
implementations for new or custom boards.
 
> > > diff --git a/arch/powerpc/kernel/head_fsl_booke.S b/arch/powerpc/kernel/head_fsl_booke.S
> > > index 20204fe..3285752 100644
> > > --- a/arch/powerpc/kernel/head_fsl_booke.S
> > > +++ b/arch/powerpc/kernel/head_fsl_booke.S
> > > @@ -162,6 +162,19 @@ _ENTRY(__early_start)
> > >  #include "fsl_booke_entry_mapping.S"
> > >  #undef ENTRY_MAPPING_BOOT_SETUP
> > >  
> > > +#if defined(CONFIG_SUSPEND) && defined(CONFIG_FSL_CORENET_RCPM)
> > > +	/* if deep_sleep_flag != 0, jump to the deep sleep resume entry */
> > > +	LOAD_REG_ADDR(r4, deep_sleep_flag)
> > > +	lwz	r3, 0(r4)
> > > +	cmpwi	r3, 0
> > > +	beq	11f
> > > +	/* clear deep_sleep_flag */
> > > +	li	r3, 0
> > > +	stw	r3, 0(r4)
> > > +	b	fsl_deepsleep_resume
> > > +11:
> > > +#endif
> > 
> > Why do you come in via the normal kernel entry, versus specifying a
> > direct entry point for deep sleep resume?  How does U-Boot even know
> > what the normal entry is when resuming?
> 
> I wish to return to a specified point (like 64-bit mode), but the code in
> fsl_booke_entry_mapping.S only can run in the first page. Because it
> only setups a temp mapping of 4KB.

Why do you need the entry mapping on 32-bit but not 64-bit?
> 
> > > +#if defined(CONFIG_SUSPEND) && defined(CONFIG_FSL_CORENET_RCPM)
> > > +_ENTRY(__entry_deep_sleep)
> > > +/*
> > > + * Bootloader will jump to here when resuming from deep sleep.
> > > + * After executing the init code in fsl_booke_entry_mapping.S,
> > > + * will jump to the real resume entry.
> > > + */
> > > +	li	r8, 1
> > > +	bl	12f
> > > +12:	mflr	r9
> > > +	addi	r9, r9, (deep_sleep_flag - 12b)
> > > +	stw	r8, 0(r9)
> > > +	b __early_start
> > > +deep_sleep_flag:
> > > +	.long	0
> > > +#endif
> > 
> > It's a bit ambiguous to say "entry_deep_sleep" when it's resuming rather
> > than entering...
> 
> How about __fsl_entry_resume?

fsl_booke_deep_sleep_resume

> > > +#define FSLDELAY(count)		\
> > > +	li	r3, (count)@l;	\
> > > +	slwi	r3, r3, 10;	\
> > > +	mtctr	r3;		\
> > > +101:	nop;			\
> > > +	bdnz	101b;
> > 
> > You don't need a namespace prefix on local macros in a non-header file.
> > 
> > Is the timebase stopped where you're calling this from?
> 
> No. My purpose is to avoid jump in the last stage of entering deep sleep.
> Jump may cause problem at that time.

"bdnz" is a jump.

What problems do you think a jump will cause?

> > You also probably want to do a "sync, readback, data dependency, isync"
> > sequence to make sure that the store has hit CCSR before you begin your
> > delay (or is a delay required at all if you do that?).
> 
> Yes. It is safer with a sync sequence.
> 
> The DDR controller need some time to signal the external DDR modules to
> enter self refresh mode.

Is it documented how much time it requires?

-Scott

^ permalink raw reply

* Re: [PATCH 8/9] powerpc/85xx: add save/restore functions for core registers
From: Scott Wood @ 2014-03-14 23:01 UTC (permalink / raw)
  To: Chenhui Zhao; +Cc: linuxppc-dev, linux-kernel, Jason.Jin, Wang Dongsheng-B40534
In-Reply-To: <20140312094237.GG4706@localhost.localdomain>

On Wed, 2014-03-12 at 17:42 +0800, Chenhui Zhao wrote:
> On Tue, Mar 11, 2014 at 07:45:14PM -0500, Scott Wood wrote:
> > On Fri, 2014-03-07 at 12:58 +0800, Chenhui Zhao wrote:
> > > From: Wang Dongsheng <dongsheng.wang@freescale.com>
> > > 
> > > Add booke_cpu_state_save() and booke_cpu_state_restore() functions which can be
> > > used to save/restore CPU's registers in the case of deep sleep and hibernation.
> > > 
> > > Supported processors: E6500, E5500, E500MC, E500v2 and E500v1.
> > > 
> > > Signed-off-by: Wang Dongsheng <dongsheng.wang@freescale.com>
> > > Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
> > > ---
> > >  arch/powerpc/include/asm/booke_save_regs.h |   96 ++++++++
> > >  arch/powerpc/kernel/Makefile               |    1 +
> > >  arch/powerpc/kernel/booke_save_regs.S      |  361 ++++++++++++++++++++++++++++
> > >  3 files changed, 458 insertions(+), 0 deletions(-)
> > >  create mode 100644 arch/powerpc/include/asm/booke_save_regs.h
> > >  create mode 100644 arch/powerpc/kernel/booke_save_regs.S
> > > 
> > > diff --git a/arch/powerpc/include/asm/booke_save_regs.h b/arch/powerpc/include/asm/booke_save_regs.h
> > > new file mode 100644
> > > index 0000000..87c357a
> > > --- /dev/null
> > > +++ b/arch/powerpc/include/asm/booke_save_regs.h
> > > @@ -0,0 +1,96 @@
> > > +/*
> > > + *  Save/restore e500 series core registers
> > 
> > Filename says booke, comment says e500.
> > 
> > Filename and comment also fail to point out that this is specifically
> > for standby/suspend, not for hibernate which is implemented in
> > swsusp_booke.S/swsusp_asm64.S.
> 
> Sorry for inconsistency. Will changes e500 to booke.
> Hibernation and suspend can share the code.

Maybe they could, but AFAICT this patchset doesn't make that happen --
and I'm not convinced that the churn would be worthwhile.  Note that
swsusp_asm64.S is not just for booke, so most of that file would not be
going away if you did make such a change.

I also don't like the way it looks like booke_save_regs.S is a booke
version of ppc_save_regs.S, even though they serve different purposes
and ppc_save_regs.S is still relevant to booke.

> > > + * Software-Use Registers
> > > + *	SPRG1			0x260		(dw * 76), 64-bit need to save.
> > > + *	SPRG3			0x268		(dw * 77), 32-bit need to save.
> > 
> > What about "CPU and NUMA node for VDSO getcpu" on 64-bit?  Currently
> > SPRG3, but it will need to change for critical interrupt support.
> > 
> > > + * MMU Registers
> > > + *	PID0 - PID2		0x270 ~ 0x280	(dw * 78 ~ dw * 80)
> > 
> > PID1/PID2 are e500v1/v2 only -- and Linux doesn't use them outside of
> > KVM (and you're not in KVM when you're running this code).
> > 
> > Are we ever going to have a non-zero PID at this point?
> 
> I incline to the view that saving all registers regardless of used or
> unused. The good point is that it can be compliant to the future
> changes of the usage of registers.
> 
> What do you think?

I agree to a certain extent, but balance it with the complexity of
dealing with registers that don't exist on all booke chips.  If they
don't really need to be saved, why go through the hassle of conditional
code?

> > > + * Debug Registers
> > > + *	DBCR0 - DBCR2		0x288 ~ 0x298	(dw * 81 ~ dw * 83)
> > > + *	IAC1 - IAC4		0x2a0 ~ 0x2b8	(dw * 84 ~ dw * 87)
> > > + *	DAC1 - DAC2		0x2c0 ~ 0x2c8	(dw * 88 ~ dw * 89)
> > > + *
> > > + */
> > 
> > IAC3-4 are not implemented on e500.
> > 
> > Do we really need to save the debug registers?  We're not going to be in
> > a debugged process when we do suspend.  If the concern is kgdb, it
> > probably needs to be told to get out of the way during suspend for other
> > reasons.
> 
> I think in the ideal case the suspend would not break any context. We
> should try to save/restore all cpu state. Of cause, trade-off is
> unavoidable in practice.
> 
> > 
> > > +#define SR_GPR1			0x000
> > > +#define SR_GPR2			0x008
> > > +#define SR_GPR13		0x010
> > > +#define SR_FPR14		0x0a8
> > > +#define SR_CR			0x138
> > > +#define SR_LR			0x140
> > > +#define SR_MSR			0x148
> > 
> > These are very vague names to be putting in a header file.
> 
> How about BOOKE_xx_OFFSET?

Better, but does it need to be in a public header file at all?
 
> > > +/*
> > > + * hibernation and deepsleep save/restore different number of registers,
> > > + * use these flags to indicate.
> > > + */
> > > +#define HIBERNATION_FLAG	1
> > > +#define DEEPSLEEP_FLAG		2
> > 
> > Again, namespacing -- but why is hibernation using this at all?  What's
> > wrong with the existing hibernation support?
> 
> How about BOOKE_HIBERNATION_FLAG?
> 
> Just wish to share code between hibernation and suspend.

No need until and unless you actually implement the change for
hibernation to use this.  As is, this is dead and untestable code.
 
> > > +booke_cpu_append_save:
> > > +	mfspr	r0, SPRN_PVR
> > > +	rlwinm	r11, r0, 16, 16, 31
> > > +
> > > +	lis	r12, 0
> > > +	ori	r12, r12, PVR_VER_E6500@l
> > > +	cmpw	r11, r12
> > > +	beq	e6500_append_save
> > > +
> > > +	lis	r12, 0
> > > +	ori	r12, r12, PVR_VER_E5500@l
> > > +	cmpw	r11, r12
> > > +	beq	e5500_append_save
> > > +
> > > +	lis	r12, 0
> > > +	ori	r12, r12, PVR_VER_E500MC@l
> > > +	cmpw	r11, r12
> > > +	beq	e500mc_append_save
> > > +
> > > +	lis	r12, 0
> > > +	ori	r12, r12, PVR_VER_E500V2@l
> > > +	cmpw	r11, r12
> > > +	beq	e500v2_append_save
> > > +
> > > +	lis	r12, 0
> > > +	ori	r12, r12, PVR_VER_E500V1@l
> > > +	cmpw	r11, r12
> > > +	beq	e500v1_append_save
> > > +	b	1f
> > > +
> > > +e6500_append_save:
> > > +	do_e6500_sr_interrupt_regs(save)
> > > +	do_e6500_sr_debug_regs(save)
> > > +	b	1f
> > > +
> > > +e5500_append_save:
> > > +	do_e5500_sr_interrupt_regs(save)
> > > +	b	1f
> > > +
> > > +e500mc_append_save:
> > > +	do_e500mc_sr_interrupt_regs(save)
> > > +	b	1f
> > > +
> > > +e500v2_append_save:
> > > +e500v1_append_save:
> > > +	do_e500_sr_interrupt_regs(save)
> > > +1:
> > > +	do_sr_interrupt_regs(save)
> > > +	do_sr_debug_regs(save)
> > > +
> > > +	blr
> > 
> > What is meant by "append" here?
> 
> Means extra registers for specific e500 derivative core.

How is that related to skipping it in the hibernation case?
 
-Scott

^ permalink raw reply

* [PATCH 1/2] lib/scatterlist: Make ARCH_HAS_SG_CHAIN an actual Kconfig
From: Laura Abbott @ 2014-03-14 22:58 UTC (permalink / raw)
  To: Russell King, David S. Miller, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, x86, James E.J. Bottomley, Fenghua Yu, Tony Luck,
	Benjamin Herrenschmidt, Paul Mackerras, Martin Schwidefsky,
	Heiko Carstens, Andrew Morton
  Cc: linux-s390, Laura Abbott, linux-scsi, linux-kernel, linux390,
	sparclinux, linux-ia64, linuxppc-dev, linux-arm-kernel

Rather than have architectures #define ARCH_HAS_SG_CHAIN in an architecture
specific scatterlist.h, make it a proper Kconfig option and use that
instead. At same time, remove the header files are are now mostly
useless and just include asm-generic/scatterlist.h.

Cc: Russell King <linux@arm.linux.org.uk>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Benjamin Herrenschmidt <benh@kernel.crashing.org>
Paul Mackerras <paulus@samba.org>
Signed-off-by: Laura Abbott <lauraa@codeaurora.org>
---
 arch/arm/Kconfig                       |  1 +
 arch/arm/include/asm/Kbuild            |  1 +
 arch/arm/include/asm/scatterlist.h     | 12 ------------
 arch/arm64/Kconfig                     |  1 +
 arch/ia64/Kconfig                      |  1 +
 arch/ia64/include/asm/Kbuild           |  1 +
 arch/ia64/include/asm/scatterlist.h    |  7 -------
 arch/powerpc/Kconfig                   |  1 +
 arch/powerpc/include/asm/Kbuild        |  1 +
 arch/powerpc/include/asm/scatterlist.h | 17 -----------------
 arch/s390/Kconfig                      |  1 +
 arch/s390/include/asm/Kbuild           |  1 +
 arch/s390/include/asm/scatterlist.h    |  3 ---
 arch/sparc/Kconfig                     |  1 +
 arch/sparc/include/asm/Kbuild          |  1 +
 arch/sparc/include/asm/scatterlist.h   |  8 --------
 arch/x86/Kconfig                       |  1 +
 arch/x86/include/asm/Kbuild            |  1 +
 arch/x86/include/asm/scatterlist.h     |  8 --------
 include/linux/scatterlist.h            |  2 +-
 include/scsi/scsi.h                    |  2 +-
 lib/Kconfig                            |  7 +++++++
 lib/scatterlist.c                      |  4 ++--
 23 files changed, 24 insertions(+), 59 deletions(-)
 delete mode 100644 arch/arm/include/asm/scatterlist.h
 delete mode 100644 arch/ia64/include/asm/scatterlist.h
 delete mode 100644 arch/powerpc/include/asm/scatterlist.h
 delete mode 100644 arch/s390/include/asm/scatterlist.h
 delete mode 100644 arch/sparc/include/asm/scatterlist.h
 delete mode 100644 arch/x86/include/asm/scatterlist.h

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 1594945..8122294 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -82,6 +82,7 @@ config ARM
 	  <http://www.arm.linux.org.uk/>.
 
 config ARM_HAS_SG_CHAIN
+	select ARCH_HAS_SG_CHAIN
 	bool
 
 config NEED_SG_DMA_LENGTH
diff --git a/arch/arm/include/asm/Kbuild b/arch/arm/include/asm/Kbuild
index 3278afe..2357ed6 100644
--- a/arch/arm/include/asm/Kbuild
+++ b/arch/arm/include/asm/Kbuild
@@ -18,6 +18,7 @@ generic-y += param.h
 generic-y += parport.h
 generic-y += poll.h
 generic-y += resource.h
+generic-y += scatterlist.h
 generic-y += sections.h
 generic-y += segment.h
 generic-y += sembuf.h
diff --git a/arch/arm/include/asm/scatterlist.h b/arch/arm/include/asm/scatterlist.h
deleted file mode 100644
index cefdb8f..0000000
--- a/arch/arm/include/asm/scatterlist.h
+++ /dev/null
@@ -1,12 +0,0 @@
-#ifndef _ASMARM_SCATTERLIST_H
-#define _ASMARM_SCATTERLIST_H
-
-#ifdef CONFIG_ARM_HAS_SG_CHAIN
-#define ARCH_HAS_SG_CHAIN
-#endif
-
-#include <asm/memory.h>
-#include <asm/types.h>
-#include <asm-generic/scatterlist.h>
-
-#endif /* _ASMARM_SCATTERLIST_H */
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 27bbcfc..f2f95f4 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -2,6 +2,7 @@ config ARM64
 	def_bool y
 	select ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ARCH_HAS_SG_CHAIN
 	select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST
 	select ARCH_WANT_OPTIONAL_GPIOLIB
 	select ARCH_WANT_COMPAT_IPC_PARSE_VERSION
diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig
index 0c8e553..13e2e8b 100644
--- a/arch/ia64/Kconfig
+++ b/arch/ia64/Kconfig
@@ -44,6 +44,7 @@ config IA64
 	select HAVE_MOD_ARCH_SPECIFIC
 	select MODULES_USE_ELF_RELA
 	select ARCH_USE_CMPXCHG_LOCKREF
+	select ARCH_HAS_SG_CHAIN
 	default y
 	help
 	  The Itanium Processor Family is Intel's 64-bit successor to
diff --git a/arch/ia64/include/asm/Kbuild b/arch/ia64/include/asm/Kbuild
index 283a831..3906865 100644
--- a/arch/ia64/include/asm/Kbuild
+++ b/arch/ia64/include/asm/Kbuild
@@ -2,6 +2,7 @@
 generic-y += clkdev.h
 generic-y += exec.h
 generic-y += kvm_para.h
+generic-y += scatterlist.h
 generic-y += trace_clock.h
 generic-y += preempt.h
 generic-y += vtime.h
diff --git a/arch/ia64/include/asm/scatterlist.h b/arch/ia64/include/asm/scatterlist.h
deleted file mode 100644
index 08fd93b..0000000
--- a/arch/ia64/include/asm/scatterlist.h
+++ /dev/null
@@ -1,7 +0,0 @@
-#ifndef _ASM_IA64_SCATTERLIST_H
-#define _ASM_IA64_SCATTERLIST_H
-
-#include <asm-generic/scatterlist.h>
-#define ARCH_HAS_SG_CHAIN
-
-#endif /* _ASM_IA64_SCATTERLIST_H */
diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 957bf34..659aee2 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -141,6 +141,7 @@ config PPC
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_IRQ_EXIT_ON_IRQ_STACK
 	select ARCH_USE_CMPXCHG_LOCKREF if PPC64
+	select ARCH_HAS_SG_CHAIN
 
 config GENERIC_CSUM
 	def_bool CPU_LITTLE_ENDIAN
diff --git a/arch/powerpc/include/asm/Kbuild b/arch/powerpc/include/asm/Kbuild
index 6c0a955..ca596ec 100644
--- a/arch/powerpc/include/asm/Kbuild
+++ b/arch/powerpc/include/asm/Kbuild
@@ -1,6 +1,7 @@
 
 generic-y += clkdev.h
 generic-y += rwsem.h
+generic-y += scatterlist.h
 generic-y += trace_clock.h
 generic-y += preempt.h
 generic-y += vtime.h
diff --git a/arch/powerpc/include/asm/scatterlist.h b/arch/powerpc/include/asm/scatterlist.h
deleted file mode 100644
index de1f620..0000000
--- a/arch/powerpc/include/asm/scatterlist.h
+++ /dev/null
@@ -1,17 +0,0 @@
-#ifndef _ASM_POWERPC_SCATTERLIST_H
-#define _ASM_POWERPC_SCATTERLIST_H
-/*
- * Copyright (C) 2001 PPC64 Team, IBM Corp
- *
- * This program is free software; you can redistribute it and/or
- * modify it under the terms of the GNU General Public License
- * as published by the Free Software Foundation; either version
- * 2 of the License, or (at your option) any later version.
- */
-
-#include <asm/dma.h>
-#include <asm-generic/scatterlist.h>
-
-#define ARCH_HAS_SG_CHAIN
-
-#endif /* _ASM_POWERPC_SCATTERLIST_H */
diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig
index 65a0775..d6c2059 100644
--- a/arch/s390/Kconfig
+++ b/arch/s390/Kconfig
@@ -142,6 +142,7 @@ config S390
 	select SYSCTL_EXCEPTION_TRACE
 	select VIRT_CPU_ACCOUNTING
 	select VIRT_TO_BUS
+	select ARCH_HAS_SG_CHAIN
 
 config SCHED_OMIT_FRAME_POINTER
 	def_bool y
diff --git a/arch/s390/include/asm/Kbuild b/arch/s390/include/asm/Kbuild
index 8386a4a..14be6d0 100644
--- a/arch/s390/include/asm/Kbuild
+++ b/arch/s390/include/asm/Kbuild
@@ -4,3 +4,4 @@ generic-y += clkdev.h
 generic-y += trace_clock.h
 generic-y += preempt.h
 generic-y += hash.h
+generic-y += scatterlist.h
diff --git a/arch/s390/include/asm/scatterlist.h b/arch/s390/include/asm/scatterlist.h
deleted file mode 100644
index 6d45ef6..0000000
--- a/arch/s390/include/asm/scatterlist.h
+++ /dev/null
@@ -1,3 +0,0 @@
-#include <asm-generic/scatterlist.h>
-
-#define ARCH_HAS_SG_CHAIN
diff --git a/arch/sparc/Kconfig b/arch/sparc/Kconfig
index 7d8b7e9..7a179fe 100644
--- a/arch/sparc/Kconfig
+++ b/arch/sparc/Kconfig
@@ -42,6 +42,7 @@ config SPARC
 	select MODULES_USE_ELF_RELA
 	select ODD_RT_SIGACTION
 	select OLD_SIGSUSPEND
+	select ARCH_HAS_SG_CHAIN
 
 config SPARC32
 	def_bool !64BIT
diff --git a/arch/sparc/include/asm/Kbuild b/arch/sparc/include/asm/Kbuild
index 4b60a0c..437f0c6 100644
--- a/arch/sparc/include/asm/Kbuild
+++ b/arch/sparc/include/asm/Kbuild
@@ -18,3 +18,4 @@ generic-y += types.h
 generic-y += word-at-a-time.h
 generic-y += preempt.h
 generic-y += hash.h
+generic-y += scatterlist.h
diff --git a/arch/sparc/include/asm/scatterlist.h b/arch/sparc/include/asm/scatterlist.h
deleted file mode 100644
index 92bb638..0000000
--- a/arch/sparc/include/asm/scatterlist.h
+++ /dev/null
@@ -1,8 +0,0 @@
-#ifndef _SPARC_SCATTERLIST_H
-#define _SPARC_SCATTERLIST_H
-
-#include <asm-generic/scatterlist.h>
-
-#define ARCH_HAS_SG_CHAIN
-
-#endif /* !(_SPARC_SCATTERLIST_H) */
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 0af5250..76997dc 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -127,6 +127,7 @@ config X86
 	select HAVE_DEBUG_STACKOVERFLOW
 	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
 	select HAVE_CC_STACKPROTECTOR
+	select ARCH_HAS_SG_CHAIN
 
 config INSTRUCTION_DECODER
 	def_bool y
diff --git a/arch/x86/include/asm/Kbuild b/arch/x86/include/asm/Kbuild
index 7f66985..9b3d749 100644
--- a/arch/x86/include/asm/Kbuild
+++ b/arch/x86/include/asm/Kbuild
@@ -5,3 +5,4 @@ genhdr-y += unistd_64.h
 genhdr-y += unistd_x32.h
 
 generic-y += clkdev.h
+generic-y += scatterlist.h
diff --git a/arch/x86/include/asm/scatterlist.h b/arch/x86/include/asm/scatterlist.h
deleted file mode 100644
index 4240878..0000000
--- a/arch/x86/include/asm/scatterlist.h
+++ /dev/null
@@ -1,8 +0,0 @@
-#ifndef _ASM_X86_SCATTERLIST_H
-#define _ASM_X86_SCATTERLIST_H
-
-#include <asm-generic/scatterlist.h>
-
-#define ARCH_HAS_SG_CHAIN
-
-#endif /* _ASM_X86_SCATTERLIST_H */
diff --git a/include/linux/scatterlist.h b/include/linux/scatterlist.h
index a964f72..4b152c8 100644
--- a/include/linux/scatterlist.h
+++ b/include/linux/scatterlist.h
@@ -136,7 +136,7 @@ static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
 static inline void sg_chain(struct scatterlist *prv, unsigned int prv_nents,
 			    struct scatterlist *sgl)
 {
-#ifndef ARCH_HAS_SG_CHAIN
+#ifndef CONFIG_ARCH_HAS_SG_CHAIN
 	BUG();
 #endif
 
diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 0a4edfe..d34cf2d 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -31,7 +31,7 @@ enum scsi_timeouts {
  * Like SCSI_MAX_SG_SEGMENTS, but for archs that have sg chaining. This limit
  * is totally arbitrary, a setting of 2048 will get you at least 8mb ios.
  */
-#ifdef ARCH_HAS_SG_CHAIN
+#ifdef CONFIG_ARCH_HAS_SG_CHAIN
 #define SCSI_MAX_SG_CHAIN_SEGMENTS	2048
 #else
 #define SCSI_MAX_SG_CHAIN_SEGMENTS	SCSI_MAX_SG_SEGMENTS
diff --git a/lib/Kconfig b/lib/Kconfig
index 991c98b..32c68d3 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -451,4 +451,11 @@ config UCS2_STRING
 
 source "lib/fonts/Kconfig"
 
+#
+# sg chaining option
+#
+
+config ARCH_HAS_SG_CHAIN
+	def_bool n
+
 endmenu
diff --git a/lib/scatterlist.c b/lib/scatterlist.c
index 3a8e8e8..4251cbd 100644
--- a/lib/scatterlist.c
+++ b/lib/scatterlist.c
@@ -73,7 +73,7 @@ EXPORT_SYMBOL(sg_nents);
  **/
 struct scatterlist *sg_last(struct scatterlist *sgl, unsigned int nents)
 {
-#ifndef ARCH_HAS_SG_CHAIN
+#ifndef CONFIG_ARCH_HAS_SG_CHAIN
 	struct scatterlist *ret = &sgl[nents - 1];
 #else
 	struct scatterlist *sg, *ret = NULL;
@@ -251,7 +251,7 @@ int __sg_alloc_table(struct sg_table *table, unsigned int nents,
 
 	if (nents == 0)
 		return -EINVAL;
-#ifndef ARCH_HAS_SG_CHAIN
+#ifndef CONFIG_ARCH_HAS_SG_CHAIN
 	if (WARN_ON_ONCE(nents > max_ents))
 		return -EINVAL;
 #endif
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related

* Re: [PATCH 7/9] fsl: add EPU FSM configuration for deep sleep
From: Scott Wood @ 2014-03-14 22:51 UTC (permalink / raw)
  To: Chenhui Zhao; +Cc: linuxppc-dev, linux-kernel, Jason.Jin
In-Reply-To: <20140312083410.GF4706@localhost.localdomain>

On Wed, 2014-03-12 at 16:34 +0800, Chenhui Zhao wrote:
> On Tue, Mar 11, 2014 at 07:08:43PM -0500, Scott Wood wrote:
> > On Fri, 2014-03-07 at 12:58 +0800, Chenhui Zhao wrote:
> > > From: Hongbo Zhang <hongbo.zhang@freescale.com>
> > > 
> > > In the last stage of deep sleep, software will trigger a Finite
> > > State Machine (FSM) to control the hardware precedure, such as
> > > board isolation, killing PLLs, removing power, and so on.
> > > 
> > > When the system is waked up by an interrupt, the FSM controls the
> > > hardware to complete the early resume precedure.
> > > 
> > > This patch configure the EPU FSM preparing for deep sleep.
> > > 
> > > Signed-off-by: Hongbo Zhang <hongbo.zhang@freescale.com>
> > > Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
> > 
> > Couldn't this be part of qoriq_pm.c?
> 
> Put the code in drivers/platform/fsl/ so that LS1 can share these code.

How can LS1 share it if it's got hardcoded T1040 values?

> > > diff --git a/drivers/platform/Kconfig b/drivers/platform/Kconfig
> > > index 09fde58..6539e6d 100644
> > > --- a/drivers/platform/Kconfig
> > > +++ b/drivers/platform/Kconfig
> > > @@ -6,3 +6,7 @@ source "drivers/platform/goldfish/Kconfig"
> > >  endif
> > >  
> > >  source "drivers/platform/chrome/Kconfig"
> > > +
> > > +if FSL_SOC
> > > +source "drivers/platform/fsl/Kconfig"
> > > +endif
> > 
> > Chrome doesn't need an ifdef -- why does this?
> 
> Don't wish other platform see these options, and the X86 and GOLDFISH have
> ifdefs.

The point is you can implement the dependency inside
drivers/platform/fsl/Kconfig.

> > > diff --git a/drivers/platform/fsl/Makefile b/drivers/platform/fsl/Makefile
> > > new file mode 100644
> > > index 0000000..d99ca0e
> > > --- /dev/null
> > > +++ b/drivers/platform/fsl/Makefile
> > > @@ -0,0 +1,5 @@
> > > +#
> > > +# Makefile for linux/drivers/platform/fsl
> > > +# Freescale Specific Power Management Drivers
> > > +#
> > > +obj-$(CONFIG_FSL_SLEEP_FSM)	+= sleep_fsm.o
> > 
> > Why is this here while the other stuff is in arch/powerpc/sysdev?
> > 
> > > +/* Block offsets */
> > > +#define	RCPM_BLOCK_OFFSET	0x00022000
> > > +#define	EPU_BLOCK_OFFSET	0x00000000
> > > +#define	NPC_BLOCK_OFFSET	0x00001000
> > 
> > Why don't these block offsets come from the device tree?
> 
> Have maped DCSR registers. Don't wish to remap them.

We don't wish to have hardcoded CCSR/DCSR offsets in the kernel source.
Sorry.
 
> > > +	/* Configure the EPU Counters */
> > > +	epu_write(EPCCR15, 0x92840000);
> > > +	epu_write(EPCCR14, 0x92840000);
> > > +	epu_write(EPCCR12, 0x92840000);
> > > +	epu_write(EPCCR11, 0x92840000);
> > > +	epu_write(EPCCR10, 0x92840000);
> > > +	epu_write(EPCCR9, 0x92840000);
> > > +	epu_write(EPCCR8, 0x92840000);
> > > +	epu_write(EPCCR5, 0x92840000);
> > > +	epu_write(EPCCR4, 0x92840000);
> > > +	epu_write(EPCCR2, 0x92840000);
> > > +
> > > +	/* Configure the SCUs Inputs */
> > > +	epu_write(EPSMCR15, 0x76000000);
> > > +	epu_write(EPSMCR14, 0x00000031);
> > > +	epu_write(EPSMCR13, 0x00003100);
> > > +	epu_write(EPSMCR12, 0x7F000000);
> > > +	epu_write(EPSMCR11, 0x31740000);
> > > +	epu_write(EPSMCR10, 0x65000030);
> > > +	epu_write(EPSMCR9, 0x00003000);
> > > +	epu_write(EPSMCR8, 0x64300000);
> > > +	epu_write(EPSMCR7, 0x30000000);
> > > +	epu_write(EPSMCR6, 0x7C000000);
> > > +	epu_write(EPSMCR5, 0x00002E00);
> > > +	epu_write(EPSMCR4, 0x002F0000);
> > > +	epu_write(EPSMCR3, 0x2F000000);
> > > +	epu_write(EPSMCR2, 0x6C700000);
> > 
> > Where do these magic numbers come from?  Which chips are they valid for?
> 
> They are for T1040. Can be found in the RCPM chapter of T1040RM.

Then put in a comment to that effect, including what part of the RCPM
chapter.

How do you plan to handle the addition of another SoC with different
values?

-Scott

^ permalink raw reply

* Re: [PATCH 6/9] powerpc/85xx: support sleep feature on QorIQ SoCs with RCPM
From: Scott Wood @ 2014-03-14 22:46 UTC (permalink / raw)
  To: Chenhui Zhao; +Cc: linuxppc-dev, linux-kernel, Jason.Jin
In-Reply-To: <20140312080814.GE4706@localhost.localdomain>

On Wed, 2014-03-12 at 16:08 +0800, Chenhui Zhao wrote:
> On Tue, Mar 11, 2014 at 07:00:27PM -0500, Scott Wood wrote:
> > On Fri, 2014-03-07 at 12:58 +0800, Chenhui Zhao wrote:
> > > In sleep mode, the clocks of e500 cores and unused IP blocks is
> > > turned off. The IP blocks which are allowed to wake up the processor
> > > are still running.
> > > 
> > > The sleep mode is equal to the Standby state in Linux. Use the
> > > command to enter sleep mode:
> > >   echo standby > /sys/power/state
> > > 
> > > Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
> > > ---
> > >  arch/powerpc/Kconfig                   |    4 +-
> > >  arch/powerpc/platforms/85xx/Makefile   |    3 +
> > >  arch/powerpc/platforms/85xx/qoriq_pm.c |   78 ++++++++++++++++++++++++++++++++
> > >  3 files changed, 83 insertions(+), 2 deletions(-)
> > >  create mode 100644 arch/powerpc/platforms/85xx/qoriq_pm.c
> > > 
> > > diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> > > index 05f6323..e1d6510 100644
> > > --- a/arch/powerpc/Kconfig
> > > +++ b/arch/powerpc/Kconfig
> > > @@ -222,7 +222,7 @@ config ARCH_HIBERNATION_POSSIBLE
> > >  config ARCH_SUSPEND_POSSIBLE
> > >  	def_bool y
> > >  	depends on ADB_PMU || PPC_EFIKA || PPC_LITE5200 || PPC_83xx || \
> > > -		   (PPC_85xx && !PPC_E500MC) || PPC_86xx || PPC_PSERIES \
> > > +		   FSL_SOC_BOOKE || PPC_86xx || PPC_PSERIES \
> > >  		   || 44x || 40x
> > >  
> > >  config PPC_DCR_NATIVE
> > > @@ -709,7 +709,7 @@ config FSL_PCI
> > >  config FSL_PMC
> > >  	bool
> > >  	default y
> > > -	depends on SUSPEND && (PPC_85xx || PPC_86xx)
> > > +	depends on SUSPEND && (PPC_85xx && !PPC_E500MC || PPC_86xx)
> > 
> > Don't mix && and || without parentheses.
> > 
> > Maybe convert this into being selected (similar to FSL_RCPM), rather
> > than default y?
> 
> Yes, will do.
> 
> > 
> > > diff --git a/arch/powerpc/platforms/85xx/Makefile b/arch/powerpc/platforms/85xx/Makefile
> > > index 25cebe7..7fae817 100644
> > > --- a/arch/powerpc/platforms/85xx/Makefile
> > > +++ b/arch/powerpc/platforms/85xx/Makefile
> > > @@ -2,6 +2,9 @@
> > >  # Makefile for the PowerPC 85xx linux kernel.
> > >  #
> > >  obj-$(CONFIG_SMP) += smp.o
> > > +ifeq ($(CONFIG_FSL_CORENET_RCPM), y)
> > > +obj-$(CONFIG_SUSPEND)	+= qoriq_pm.o
> > > +endif
> > 
> > There should probably be a kconfig symbol for this.
> 
> OK.
> 
> > 
> > > diff --git a/arch/powerpc/platforms/85xx/qoriq_pm.c b/arch/powerpc/platforms/85xx/qoriq_pm.c
> > > new file mode 100644
> > > index 0000000..915b13b
> > > --- /dev/null
> > > +++ b/arch/powerpc/platforms/85xx/qoriq_pm.c
> > > @@ -0,0 +1,78 @@
> > > +/*
> > > + * Support Power Management feature
> > > + *
> > > + * Copyright 2014 Freescale Semiconductor Inc.
> > > + *
> > > + * Author: Chenhui Zhao <chenhui.zhao@freescale.com>
> > > + *
> > > + * This program is free software; you can redistribute	it and/or modify it
> > > + * under  the terms of	the GNU General	 Public License as published by the
> > > + * Free Software Foundation;  either version 2 of the  License, or (at your
> > > + * option) any later version.
> > > + */
> > > +
> > > +#include <linux/kernel.h>
> > > +#include <linux/suspend.h>
> > > +#include <linux/of_platform.h>
> > > +
> > > +#include <sysdev/fsl_soc.h>
> > > +
> > > +#define FSL_SLEEP		0x1
> > > +#define FSL_DEEP_SLEEP		0x2
> > 
> > FSL_DEEP_SLEEP is unused.
> 
> Will be used in the last patch.
> [PATCH 9/9] powerpc/pm: support deep sleep feature on T1040

Ideally the #define would have been introduced in that patch.

> > > +	sleep_modes = FSL_SLEEP;
> > > +	sleep_pm_state = PLAT_PM_SLEEP;
> > > +
> > > +	np = of_find_compatible_node(NULL, NULL, "fsl,qoriq-rcpm-2.0");
> > > +	if (np)
> > > +		sleep_pm_state = PLAT_PM_LPM20;
> > > +
> > > +	suspend_set_ops(&qoriq_suspend_ops);
> > > +
> > > +	return 0;
> > > +}
> > > +arch_initcall(qoriq_suspend_init);
> > 
> > Why is this not a platform driver?  If fsl_pmc can do it...
> > 
> > -Scott
> 
> It can be, but what advantage of being a platform driver.

If nothing else, compliance with the standard way of doing things.  Why
not make it a platform driver?  You'd be able to use dev_err, have a
place in sysfs if attributes are needed in the future, etc.

A better answer might be that there are multiple not-very-related files
driving different portions of RCPM.

-Scott

^ permalink raw reply

* Re: [PATCH 5/9] powerpc/85xx: disable irq by hardware when suspend for 64-bit
From: Scott Wood @ 2014-03-14 22:41 UTC (permalink / raw)
  To: Chenhui Zhao; +Cc: linuxppc-dev, linux-kernel, Jason.Jin
In-Reply-To: <20140312074610.GD4706@localhost.localdomain>

On Wed, 2014-03-12 at 15:46 +0800, Chenhui Zhao wrote:
> On Tue, Mar 11, 2014 at 06:51:20PM -0500, Scott Wood wrote:
> > On Fri, 2014-03-07 at 12:58 +0800, Chenhui Zhao wrote:
> > > In 64-bit mode, kernel just clears the irq soft-enable flag
> > > in struct paca_struct to disable external irqs. But, in
> > > the case of suspend, irqs should be disabled by hardware.
> > > Therefore, hook a function to ppc_md.suspend_disable_irqs
> > > to really disable irqs.
> > > 
> > > Signed-off-by: Chenhui Zhao <chenhui.zhao@freescale.com>
> > > ---
> > >  arch/powerpc/platforms/85xx/corenet_generic.c |   12 ++++++++++++
> > >  1 files changed, 12 insertions(+), 0 deletions(-)
> > > 
> > > diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c b/arch/powerpc/platforms/85xx/corenet_generic.c
> > > index 3fdf9f3..983d81f 100644
> > > --- a/arch/powerpc/platforms/85xx/corenet_generic.c
> > > +++ b/arch/powerpc/platforms/85xx/corenet_generic.c
> > > @@ -32,6 +32,13 @@
> > >  #include <sysdev/fsl_pci.h>
> > >  #include "smp.h"
> > >  
> > > +#if defined(CONFIG_PPC64) && defined(CONFIG_SUSPEND)
> > > +static void fsl_suspend_disable_irqs(void)
> > > +{
> > > +	__hard_irq_disable();
> > > +}
> > > +#endif
> > 
> > Why the underscore version?  Don't you want PACA_IRQ_HARD_DIS to be set?
> > 
> > If hard disabling is appropriate here, shouldn't we do it in
> > generic_suspend_disable_irqs()?
> > 
> > Are there any existing platforms that supply a
> > ppc_md.suspend_disable_irqs()?  I don't see any when grepping.
> > 
> > -Scott
> 
> Will use hard_irq_disable().
> 
> I think this is a general problem for powerpc.
> Should clear MSR_EE before suspend. I agree to put it
> in generic_suspend_disable_irqs().

BTW, make sure you test this patchset with CONFIG_DEBUG_PREEMPT and
similar debugging options to help ensure that the soft IRQ state is
being tracked properly.

-Scott

^ permalink raw reply

* Re: [PATCH 3/9] powerpc/rcpm: add RCPM driver
From: Scott Wood @ 2014-03-14 22:34 UTC (permalink / raw)
  To: Chenhui Zhao; +Cc: linuxppc-dev, linux-kernel, Jason.Jin
In-Reply-To: <20140312035954.GB4706@localhost.localdomain>

On Wed, 2014-03-12 at 11:59 +0800, Chenhui Zhao wrote:
> On Tue, Mar 11, 2014 at 06:42:51PM -0500, Scott Wood wrote:
> > On Fri, 2014-03-07 at 12:57 +0800, Chenhui Zhao wrote:
> > > +int fsl_rcpm_init(void)
> > > +{
> > > +	struct device_node *np;
> > > +
> > > +	np = of_find_compatible_node(NULL, NULL, "fsl,qoriq-rcpm-2.0");
> > > +	if (np) {
> > > +		rcpm_v2_regs = of_iomap(np, 0);
> > > +		of_node_put(np);
> > > +		if (!rcpm_v2_regs)
> > > +			return -ENOMEM;
> > > +
> > > +		qoriq_pm_ops = &qoriq_rcpm_v2_ops;
> > > +
> > > +	} else {
> > > +		np = of_find_compatible_node(NULL, NULL, "fsl,qoriq-rcpm-1.0");
> > > +		if (np) {
> > > +			rcpm_v1_regs = of_iomap(np, 0);
> > > +			of_node_put(np);
> > > +			if (!rcpm_v1_regs)
> > > +				return -ENOMEM;
> > > +
> > > +			qoriq_pm_ops = &qoriq_rcpm_v1_ops;
> > > +
> > > +		} else {
> > > +			pr_err("%s: can't find the rcpm node.\n", __func__);
> > > +			return -EINVAL;
> > > +		}
> > > +	}
> > > +
> > > +	return 0;
> > > +}
> > 
> > Why isn't this a proper platform driver?
> > 
> > -Scott
> 
> The RCPM is not a single function IP block, instead it is a collection
> of device run control and power management. It would be called by other
> drivers and functions. For example, the callback .freeze_time_base()
> need to be called at early stage of kernel init. Therefore, it would be
> better to init it at early stage.

OK, but consider using of_find_matching_node_and_match().

-Scott

^ permalink raw reply

* Re: [PATCH 9/9] powerpc/pm: support deep sleep feature on T1040
From: Scott Wood @ 2014-03-14 22:26 UTC (permalink / raw)
  To: Kevin Hao; +Cc: linuxppc-dev, Chenhui Zhao, Jason.Jin, linux-kernel
In-Reply-To: <20140313074613.GD26692@pek-khao-d1.corp.ad.wrs.com>

On Thu, 2014-03-13 at 15:46 +0800, Kevin Hao wrote:
> On Wed, Mar 12, 2014 at 12:43:05PM -0500, Scott Wood wrote:
> > > Shouldn't we use "readback, sync" here? The following is quoted form t4240RM:
> > >   To guarantee that the results of any sequence of writes to configuration
> > >   registers are in effect, the final configuration register write should be
> > >   immediately followed by a read of the same register, and that should be
> > >   followed by a SYNC instruction. Then accesses can safely be made to memory
> > >   regions affected by the configuration register write.
> > 
> > I agree that the sync before the readback is probably not necessary,
> > since transactions to the same address should already be ordered.
> > 
> > A sync after the readback helps if you're trying to order the readback
> > with subsequent memory accesses, though in that case wouldn't a sync
> > alone (no readback) be adequate?
> 
> No, we don't just want to order the subsequent memory access here.
> The 'write, readback, sync' is the required sequence if we want to make
> sure that the writing to CCSR register does really take effect.
> 
> >  Though maybe not always -- see the
> > comment near the end of fsl_elbc_write_buf() in
> > drivers/mtd/nand_fsl_elbc.c.  I guess the readback does more than just
> > make sure the device has seen the write, ensuring that the device has
> > finished the transaction to the point of acting on another one.
> 
> Agree.
> 
> > 
> > The data dependency plus isync sequence, which is done by the normal I/O
> > accessors used from C code, orders the readback versus all future
> > instructions (not just I/O).  The delay loop is not I/O.
> 
> According to the PowerISA, the sequence 'load, date dependency, isync' only
> order the load accesses. 

The point is to order the delay loop after the load, not to order
storage versus storage.

This is a sequence we're already using on all of our I/O loads
(excluding accesses like in this patch that don't use the standard
accessors).  I'm confident that it works even if it's not
architecturally guaranteed.  I'm not sure that there exists a clear
architectural way of synchronizing non-storage instructions relative to
storage instructions.

Given that isync is documented as preventing any execution of
instructions after the isync until all previous instructions complete,
it doesn't seem to make sense for the architecture to explicitly talk
about loads (as opposed to any other instruction) following a load,
dependent conditional branch, isync sequence.

> So if we want to order all the storage access as well
> as execution synchronization, we should choose sync here.

Do we need execution synchronization or context synchronization?

The t4240 RM section that talks about a readback and a sync is in the
context of subsequent memory operations ("Then accesses can safely be
made to memory regions affected..."), not arbitrary instructions.  There
are also a couple other places in the RM where isync is recommended
instead (when setting LAWs or CCSRBAR), even though those also only
involve memory accesses.

In any case, this is not performance critical and thus it's better to
oversynchronize than undersynchronize.

-Scott

^ permalink raw reply

* Re: [PATCH 06/10] powerpc/booke64: Use SPRG_TLB_EXFRAME on bolted handlers
From: Scott Wood @ 2014-03-14 19:29 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Tiejun Chen, linuxppc-dev, kvm-ppc, Mihai Caraman
In-Reply-To: <1394755249-8856-7-git-send-email-scottwood@freescale.com>

On Thu, 2014-03-13 at 19:00 -0500, Scott Wood wrote:
> @@ -444,6 +451,9 @@ _GLOBAL(kvmppc_resume_host)
>  	PPC_STD(r8, VCPU_SHARED_SPRG6, r11)
>  	mfxer	r3
>  	PPC_STD(r9, VCPU_SHARED_SPRG7, r11)
> +#ifdef CONFIG_64BIT
> +	mtspr	SPRN_SPRG_VDSO_WRITE, r3
> +#endif

Oops, this hunk was a patch shuffling accident.  v2 coming.

-Scott

^ permalink raw reply

* Re: [PATCH v3 10/52] arm, kvm: Fix CPU hotplug callback registration
From: Christoffer Dall @ 2014-03-14 19:10 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: ego, kvm, peterz, linux-kernel, linuxppc-dev, paulus, walken,
	kvmarm, linux-arch, linux, mingo, marc.zyngier, paulmck, linux-pm,
	Gleb Natapov, rusty, tglx, linux-arm-kernel, rjw, oleg, tj,
	Paolo Bonzini, akpm
In-Reply-To: <53229701.8050405@linux.vnet.ibm.com>

On Fri, Mar 14, 2014 at 11:13:29AM +0530, Srivatsa S. Bhat wrote:
> On 03/13/2014 04:51 AM, Christoffer Dall wrote:
> > On Tue, Mar 11, 2014 at 02:05:38AM +0530, Srivatsa S. Bhat wrote:
> >> Subsystems that want to register CPU hotplug callbacks, as well as perform
> >> initialization for the CPUs that are already online, often do it as shown
> >> below:
> >>
> >> 	get_online_cpus();
> >>
> >> 	for_each_online_cpu(cpu)
> >> 		init_cpu(cpu);
> >>
> >> 	register_cpu_notifier(&foobar_cpu_notifier);
> >>
> >> 	put_online_cpus();
> >>
> >> This is wrong, since it is prone to ABBA deadlocks involving the
> >> cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
> >> with CPU hotplug operations).
> >>
> >> Instead, the correct and race-free way of performing the callback
> >> registration is:
> >>
> >> 	cpu_notifier_register_begin();
> >>
> >> 	for_each_online_cpu(cpu)
> >> 		init_cpu(cpu);
> >>
> >> 	/* Note the use of the double underscored version of the API */
> >> 	__register_cpu_notifier(&foobar_cpu_notifier);
> >>
> >> 	cpu_notifier_register_done();
> >>
> >>
> >> Fix the kvm code in arm by using this latter form of callback registration.
> >>
> >> Cc: Christoffer Dall <christoffer.dall@linaro.org>
> >> Cc: Gleb Natapov <gleb@kernel.org>
> >> Cc: Russell King <linux@arm.linux.org.uk>
> >> Cc: Ingo Molnar <mingo@kernel.org>
> >> Cc: kvmarm@lists.cs.columbia.edu
> >> Cc: kvm@vger.kernel.org
> >> Cc: linux-arm-kernel@lists.infradead.org
> >> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
> >> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
> >> ---
> >>
> >>  arch/arm/kvm/arm.c |    7 ++++++-
> >>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index bd18bb8..f0e50a0 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -1051,21 +1051,26 @@ int kvm_arch_init(void *opaque)
> >>  		}
> >>  	}
> >>  
> >> +	cpu_notifier_register_begin();
> >> +
> >>  	err = init_hyp_mode();
> >>  	if (err)
> >>  		goto out_err;
> >>  
> >> -	err = register_cpu_notifier(&hyp_init_cpu_nb);
> >> +	err = __register_cpu_notifier(&hyp_init_cpu_nb);
> >>  	if (err) {
> >>  		kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
> >>  		goto out_err;
> >>  	}
> >>  
> >> +	cpu_notifier_register_done();
> >> +
> >>  	hyp_cpu_pm_init();
> >>  
> >>  	kvm_coproc_table_init();
> >>  	return 0;
> >>  out_err:
> >> +	cpu_notifier_register_done();
> >>  	return err;
> >>  }
> >>  
> >>
> > 
> > Just so we're clear, the existing code was simply racy as not prone to
> > deadlocks, right?
> > 
> > This makes it clear that the test above for compatible CPUs can be quite
> > easily evaded by using CPU hotplug, but we don't really have a good
> > solution for handling that yet...  Hmmm, grumble grumble, I guess if you
> > hotplug unsupported CPUs on a KVM/ARM system for now, stuff will break.
> > 
> 
> In this particular case, there was no deadlock possibility, rather the
> existing code had insufficient synchronization against CPU hotplug.
> 
> init_hyp_mode() would invoke cpu_init_hyp_mode() on currently online CPUs
> using on_each_cpu(). If a CPU came online after this point and before calling
> register_cpu_notifier(), that CPU would remain uninitialized because this
> subsystem would miss the hot-online event. This patch fixes this bug and
> also uses the new synchronization method (instead of get/put_online_cpus())
> to ensure that we don't deadlock with CPU hotplug.
> 

Yes, that was my conclusion as well.  Thanks for clarifying.  (It could
be noted in the commit message as well if you should feel so inclined).

> > In any case:
> > Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
> > 
> 
> Thanks a lot!
> 
Thanks,
-Christoffer

^ permalink raw reply

* Re: [PATCH 1/2 v2] irqdomain: add support for creating a continous mapping
From: Thomas Gleixner @ 2014-03-14 11:18 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: Scott Wood, linuxppc-dev
In-Reply-To: <20140221085736.GA11411@linutronix.de>

[-- Attachment #1: Type: TEXT/PLAIN, Size: 4896 bytes --]

On Fri, 21 Feb 2014, Sebastian Andrzej Siewior wrote:

> A MSI device may have multiple interrupts. That means that the
> interrupts numbers should be continuos so that pdev->irq refers to the
> first interrupt, pdev->irq + 1 to the second and so on.
> This patch adds support for continuous allocation of virqs for a range
> of hwirqs. The function is based on irq_create_mapping() but due to the
> number argument there is very little in common now.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
> ---
> Scott, this is what you suggested. I must admit, it does not look that
> bad. It is just compile tested.

Is it tested for real as well?
 
> +static int irq_check_continuous_mapping(struct irq_domain *domain,
> +		irq_hw_number_t hwirq, unsigned int num)
> +{
> +	int virq;
> +	int i;
> +
> +	virq = irq_find_mapping(domain, hwirq);
> +
> +	for (i = 1; i < num; i++) {
> +		unsigned int next;
> +
> +		next = irq_find_mapping(domain, hwirq + i);
> +		if (next == virq + i)
> +			continue;
> +
> +		pr_err("irq: invalid partial mapping. First hwirq %lu maps to "
> +				"%d and \n", hwirq, virq);
> +		pr_err("irq: +%d hwirq (%lu) maps to %d but should be %d.\n",
> +				i, hwirq + i, next, virq + i);
> +		return -EINVAL;
> +	}
> +
> +	pr_debug("-> existing mapping on virq %d\n", virq);
> +	return virq;
> +}
> +
>  /**
> - * irq_create_mapping() - Map a hardware interrupt into linux irq space
> + * irq_create_mapping_block() - Map multiple hardware interrupts
>   * @domain: domain owning this hardware interrupt or NULL for default domain
>   * @hwirq: hardware irq number in that domain space
> + * @num: number of interrupts
> + *
> + * Maps a hwirq to a newly allocated virq. If num is greater than 1 then num
> + * hwirqs (hwirq … hwirq + num - 1) will be mapped and virq will be  continuous.
> + * Returns the first linux virq number.
>   *
> - * Only one mapping per hardware interrupt is permitted. Returns a linux
> - * irq number.
>   * If the sense/trigger is to be specified, set_irq_type() should be called
>   * on the number returned from that call.
>   */
> -unsigned int irq_create_mapping(struct irq_domain *domain,
> -				irq_hw_number_t hwirq)
> +unsigned int irq_create_mapping_block(struct irq_domain *domain,
> +		irq_hw_number_t hwirq, unsigned int num)
>  {
> -	unsigned int hint;
>  	int virq;
> +	int i;
> +	int node;
> +	unsigned int hint;

What's wrong with

	unsigned int hint;
  	int virq, i, node;

?
  
> -	pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq);
> +	pr_debug("%s(0x%p, 0x%lx, %d)\n", __func__, domain, hwirq, num);
>  
>  	/* Look for default domain if nececssary */
> -	if (domain == NULL)
> +	if (!domain && num == 1)
>  		domain = irq_default_domain;
> +
>  	if (domain == NULL) {
>  		WARN(1, "%s(, %lx) called with NULL domain\n", __func__, hwirq);
>  		return 0;
>  	}
>  	pr_debug("-> using domain @%p\n", domain);
>  
>  	/* Check if mapping already exists */
> -	virq = irq_find_mapping(domain, hwirq);
> -	if (virq) {
> -		pr_debug("-> existing mapping on virq %d\n", virq);
> -		return virq;
> +	for (i = 0; i < num; i++) {
> +		virq = irq_find_mapping(domain, hwirq + i);
> +		if (virq != NO_IRQ) {
> +			if (i == 0)
> +				return irq_check_continuous_mapping(domain,
> +						hwirq, num);

So what is the loop for? If i == 0 and virq != NO_IRQ you return. That
does not make sense at all.
  
> +			pr_err("irq: hwirq %ld has no mapping but hwirq %ld "
> +				"maps to virq %d. This can't be a block\n",
> +				hwirq, hwirq + i, virq);
> +			return -EINVAL;
> +		}
>  	}
>  
> +	node = of_node_to_nid(domain->of_node);
>  	/* Allocate a virtual interrupt number */
>  	hint = hwirq % nr_irqs;
>  	if (hint == 0)
>  		hint++;
> -	virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node));
> -	if (virq <= 0)
> -		virq = irq_alloc_desc_from(1, of_node_to_nid(domain->of_node));
> +	virq = irq_alloc_descs_from(hint, num, node);
> +	if (virq <= 0 && hint != 1)
> +		virq = irq_alloc_descs_from(1, num, node);
>  	if (virq <= 0) {
>  		pr_debug("-> virq allocation failed\n");
>  		return 0;
>  	}
>  
> -	if (irq_domain_associate(domain, virq, hwirq)) {
> -		irq_free_desc(virq);
> -		return 0;
> +	irq_domain_associate_many(domain, virq, hwirq, num);

So irq_domain_associate can fail, but irq_domain_associate_many cannot ?

> +	if (num == 1) {
> +		pr_debug("irq %lu on domain %s mapped to virtual irq %u\n",
> +			hwirq, of_node_full_name(domain->of_node), virq);
> +		return virq;
>  	}
> -
> -	pr_debug("irq %lu on domain %s mapped to virtual irq %u\n",
> -		hwirq, of_node_full_name(domain->of_node), virq);
> -
> +	pr_debug("irqs %lu…%lu on domain %s mapped to virtual irqs %u…%u\n",
> +		hwirq, hwirq + num - 1, of_node_full_name(domain->of_node),
> +			virq, virq + num - 1);

A single pr_debug is sufficient, hmm?

Thanks,

	tglx

^ permalink raw reply

* Re: [PATCH RFC v9 5/6] dma: mpc512x: add device tree binding document
From: Arnd Bergmann @ 2014-03-14 10:43 UTC (permalink / raw)
  To: Mark Rutland
  Cc: devicetree@vger.kernel.org, Lars-Peter Clausen, Vinod Koul,
	Gerhard Sittig, Andy Shevchenko, Alexander Popov,
	dmaengine@vger.kernel.org, Dan Williams, Anatolij Gustschin,
	linuxppc-dev@lists.ozlabs.org
In-Reply-To: <20140313180916.GF25870@e106331-lin.cambridge.arm.com>

On Thursday 13 March 2014, Mark Rutland wrote:
> > +
> > +Example:
> > +
> > +     dma0: dma@14000 {
> > +             compatible = "fsl,mpc5121-dma";
> > +             reg = <0x14000 0x1800>;
> > +             interrupts = <65 0x8>;
> > +             #dma-cells = <1>;
> > +     };
> > +
> > +
> > +Client node properties:
> > +
> > +Required properties:
> > +- dmas:                      list of DMA specifiers, consisting each of a handle
> > +                     for the DMA controller and integer cells to specify
> > +                     the channel used within the DMA controller
> > +- dma-names:         list of identifier strings for the DMA specifiers,
> > +                     client device driver code uses these strings to
> > +                     have DMA channels looked up at the controller
> 
> List the exact names you expect, or the dma-names property is useless.

Listing specific names makes no sense in the binding for the provider,
they should be part of the slave device binding. A reference to the
generic bindings/dma/dma.txt file should be enough here.

	Arnd

^ permalink raw reply

* Re: [PATCH RFC v9 3/6] dma: mpc512x: replace devm_request_irq() with request_irq()
From: Andy Shevchenko @ 2014-03-14  9:50 UTC (permalink / raw)
  To: Alexander Popov
  Cc: Lars-Peter Clausen, Arnd Bergmann, Vinod Koul, Gerhard Sittig,
	dmaengine, Dan Williams, Anatolij Gustschin, linuxppc-dev
In-Reply-To: <1394624875-24411-4-git-send-email-a13xp0p0v88@gmail.com>

On Wed, 2014-03-12 at 15:47 +0400, Alexander Popov wrote:
> Replace devm_request_irq() with request_irq() since there is no need
> to use it because the original code always frees IRQ manually with
> devm_free_irq(). Replace devm_free_irq() with free_irq() accordingly.
> 
> Signed-off-by: Alexander Popov <a13xp0p0v88@gmail.com>
> ---
>  drivers/dma/mpc512x_dma.c | 11 +++++------
>  1 file changed, 5 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/dma/mpc512x_dma.c b/drivers/dma/mpc512x_dma.c
> index b1e430c..ff7f678 100644
> --- a/drivers/dma/mpc512x_dma.c
> +++ b/drivers/dma/mpc512x_dma.c
> @@ -921,16 +921,15 @@ static int mpc_dma_probe(struct platform_device *op)
>  	mdma->tcd = (struct mpc_dma_tcd *)((u8 *)(mdma->regs)
>  							+ MPC_DMA_TCD_OFFSET);
>  
> -	retval = devm_request_irq(dev, mdma->irq, &mpc_dma_irq, 0, DRV_NAME,
> -									mdma);
> +	retval = request_irq(mdma->irq, &mpc_dma_irq, 0, DRV_NAME, mdma);
>  	if (retval) {
>  		dev_err(dev, "Error requesting IRQ!\n");
>  		return -EINVAL;
>  	}
>  
>  	if (mdma->is_mpc8308) {
> -		retval = devm_request_irq(dev, mdma->irq2, &mpc_dma_irq, 0,
> -				DRV_NAME, mdma);
> +		retval = request_irq(mdma->irq2, &mpc_dma_irq, 0,
> +							DRV_NAME, mdma);
>  		if (retval) {
>  			dev_err(dev, "Error requesting IRQ2!\n");

+ free_irq(IRQ1) here and may be in other places.

>  			return -EINVAL;
> @@ -1020,7 +1019,7 @@ static int mpc_dma_probe(struct platform_device *op)
>  	dev_set_drvdata(dev, mdma);
>  	retval = dma_async_device_register(dma);
>  	if (retval) {
> -		devm_free_irq(dev, mdma->irq, mdma);
> +		free_irq(mdma->irq, mdma);
>  		irq_dispose_mapping(mdma->irq);
>  	}
>  
> @@ -1033,7 +1032,7 @@ static int mpc_dma_remove(struct platform_device *op)
>  	struct mpc_dma *mdma = dev_get_drvdata(dev);
>  
>  	dma_async_device_unregister(&mdma->dma);
> -	devm_free_irq(dev, mdma->irq, mdma);
> +	free_irq(mdma->irq, mdma);
>  	irq_dispose_mapping(mdma->irq);
>  
>  	return 0;


-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply

* Re: [PATCH RFC v9 2/6] dma: mpc512x: add support for peripheral transfers
From: Andy Shevchenko @ 2014-03-14  9:47 UTC (permalink / raw)
  To: Alexander Popov
  Cc: Lars-Peter Clausen, Arnd Bergmann, Vinod Koul, Gerhard Sittig,
	dmaengine, Dan Williams, Anatolij Gustschin, linuxppc-dev
In-Reply-To: <1394624875-24411-3-git-send-email-a13xp0p0v88@gmail.com>

On Wed, 2014-03-12 at 15:47 +0400, Alexander Popov wrote:
> Introduce support for slave s/g transfer preparation and the associated
> device control callback in the MPC512x DMA controller driver, which adds
> support for data transfers between memory and peripheral I/O to the
> previously supported mem-to-mem transfers.


> --- a/drivers/dma/mpc512x_dma.c
> +++ b/drivers/dma/mpc512x_dma.c
> @@ -2,6 +2,7 @@
>   * Copyright (C) Freescale Semicondutor, Inc. 2007, 2008.
>   * Copyright (C) Semihalf 2009
>   * Copyright (C) Ilya Yanok, Emcraft Systems 2010
> + * Copyright (C) Alexander Popov, Promcontroller 2013

2014?

[]

> +static int mpc_dma_device_control(struct dma_chan *chan, enum dma_ctrl_cmd cmd,
> +							unsigned long arg)
> +{
> +	struct mpc_dma_chan *mchan;
> +	struct mpc_dma *mdma;
> +	struct dma_slave_config *cfg;
> +	unsigned long flags;
> +
> +	mchan = dma_chan_to_mpc_dma_chan(chan);
> +	switch (cmd) {
> +	case DMA_TERMINATE_ALL:
> +		/* Disable channel requests */
> +		mdma = dma_chan_to_mpc_dma(chan);
> +
> +		spin_lock_irqsave(&mchan->lock, flags);
> +
> +		out_8(&mdma->regs->dmacerq, chan->chan_id);
> +		list_splice_tail_init(&mchan->prepared, &mchan->free);
> +		list_splice_tail_init(&mchan->queued, &mchan->free);
> +		list_splice_tail_init(&mchan->active, &mchan->free);
> +
> +		spin_unlock_irqrestore(&mchan->lock, flags);
> +
> +		return 0;
> +	case DMA_SLAVE_CONFIG:
> +		/* Constraints:
> +		 *  - only transfers between a peripheral device and
> +		 *     memory are supported;
> +		 *  - minimal transfer chunk is 4 bytes and consequently
> +		 *     source and destination addresses must be 4-byte aligned
> +		 *     and transfer size must be aligned on (4 * maxburst)
> +		 *     boundary;
> +		 *  - during the transfer RAM address is being incremented by
> +		 *     the size of minimal transfer chunk;
> +		 *  - peripheral port's address is constant during the transfer.
> +		 */
> +
> +		cfg = (void *)arg;
> +
> +		if (!is_slave_direction(cfg->direction))
> +			return -EINVAL;

As far as I understand the intention you have not to use direction field
in the dma_slave_config. It will be removed once.

> +
> +		if (cfg->src_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES &&
> +			cfg->dst_addr_width != DMA_SLAVE_BUSWIDTH_4_BYTES)
> +			return -EINVAL;
> +
> +		spin_lock_irqsave(&mchan->lock, flags);
> +
> +		if (cfg->direction == DMA_DEV_TO_MEM) {
> +			mchan->per_paddr = cfg->src_addr;
> +			mchan->tcd_nunits = cfg->src_maxburst;
> +		} else {
> +			mchan->per_paddr = cfg->dst_addr;
> +			mchan->tcd_nunits = cfg->dst_maxburst;
> +		}

Ditto.


-- 
Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Intel Finland Oy

^ permalink raw reply

* Re: [PATCH v3 36/52] zsmalloc: Fix CPU hotplug callback registration
From: Minchan Kim @ 2014-03-14  6:41 UTC (permalink / raw)
  To: Srivatsa S. Bhat
  Cc: linux-arch, ego, walken, linux, akpm, linux-pm, peterz, rusty,
	rjw, oleg, linux-kernel, linux-mm, linuxppc-dev, paulus, tj, tglx,
	paulmck, mingo, Nitin Gupta
In-Reply-To: <20140310203959.10746.61303.stgit@srivatsabhat.in.ibm.com>

On Tue, Mar 11, 2014 at 02:09:59AM +0530, Srivatsa S. Bhat wrote:
> Subsystems that want to register CPU hotplug callbacks, as well as perform
> initialization for the CPUs that are already online, often do it as shown
> below:
> 
> 	get_online_cpus();
> 
> 	for_each_online_cpu(cpu)
> 		init_cpu(cpu);
> 
> 	register_cpu_notifier(&foobar_cpu_notifier);
> 
> 	put_online_cpus();
> 
> This is wrong, since it is prone to ABBA deadlocks involving the
> cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
> with CPU hotplug operations).
> 
> Instead, the correct and race-free way of performing the callback
> registration is:
> 
> 	cpu_notifier_register_begin();
> 
> 	for_each_online_cpu(cpu)
> 		init_cpu(cpu);
> 
> 	/* Note the use of the double underscored version of the API */
> 	__register_cpu_notifier(&foobar_cpu_notifier);
> 
> 	cpu_notifier_register_done();
> 
> 
> Fix the zsmalloc code by using this latter form of callback registration.
> 
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Nitin Gupta <ngupta@vflare.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: linux-mm@kvack.org
> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Acked-by: Minchan Kim <minchan@kernel.org>

Thanks.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply

* [PATCH] powerpc: ratelimit users spamming kernel log buffer
From: Michael Neuling @ 2014-03-14  6:03 UTC (permalink / raw)
  To: benh; +Cc: michael, Linux PPC dev, Paul Mackerras

The facility unavailable exception can be triggered from userspace by
accessing PMU registers when EBB is not enabled.  This causes the
included pr_err() to run, hence spamming the kernel log buffer.

This avoids this by rate limiting these messages.

Signed-off-by: Michael Neuling <mikey@neuling.org>
---
We can bike shed the hell out of this one.  We could just remove it, or
we could put it under the control of show_unhandled_signals.

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e435bc0..7836de3f 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -1340,8 +1340,9 @@ void facility_unavailable_exception(struct pt_regs *regs)
 	if (!arch_irq_disabled_regs(regs))
 		local_irq_enable();
 
-	pr_err("%sFacility '%s' unavailable, exception at 0x%lx, MSR=%lx\n",
-	       hv ? "Hypervisor " : "", facility, regs->nip, regs->msr);
+	pr_err_ratelimited(
+		"%sFacility '%s' unavailable, exception at 0x%lx, MSR=%lx\n",
+		hv ? "Hypervisor " : "", facility, regs->nip, regs->msr);
 
 	if (user_mode(regs)) {
 		_exception(SIGILL, regs, ILL_ILLOPC, regs->nip);

^ permalink raw reply related

* Re: [PATCH v3 10/52] arm, kvm: Fix CPU hotplug callback registration
From: Srivatsa S. Bhat @ 2014-03-14  5:43 UTC (permalink / raw)
  To: Christoffer Dall
  Cc: ego, kvm, peterz, linux-kernel, linuxppc-dev, paulus, walken,
	kvmarm, linux-arch, linux, mingo, marc.zyngier, paulmck, linux-pm,
	Gleb Natapov, rusty, tglx, linux-arm-kernel, rjw, oleg, tj,
	Paolo Bonzini, akpm
In-Reply-To: <20140312232127.GC24808@cbox>

On 03/13/2014 04:51 AM, Christoffer Dall wrote:
> On Tue, Mar 11, 2014 at 02:05:38AM +0530, Srivatsa S. Bhat wrote:
>> Subsystems that want to register CPU hotplug callbacks, as well as perform
>> initialization for the CPUs that are already online, often do it as shown
>> below:
>>
>> 	get_online_cpus();
>>
>> 	for_each_online_cpu(cpu)
>> 		init_cpu(cpu);
>>
>> 	register_cpu_notifier(&foobar_cpu_notifier);
>>
>> 	put_online_cpus();
>>
>> This is wrong, since it is prone to ABBA deadlocks involving the
>> cpu_add_remove_lock and the cpu_hotplug.lock (when running concurrently
>> with CPU hotplug operations).
>>
>> Instead, the correct and race-free way of performing the callback
>> registration is:
>>
>> 	cpu_notifier_register_begin();
>>
>> 	for_each_online_cpu(cpu)
>> 		init_cpu(cpu);
>>
>> 	/* Note the use of the double underscored version of the API */
>> 	__register_cpu_notifier(&foobar_cpu_notifier);
>>
>> 	cpu_notifier_register_done();
>>
>>
>> Fix the kvm code in arm by using this latter form of callback registration.
>>
>> Cc: Christoffer Dall <christoffer.dall@linaro.org>
>> Cc: Gleb Natapov <gleb@kernel.org>
>> Cc: Russell King <linux@arm.linux.org.uk>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: kvmarm@lists.cs.columbia.edu
>> Cc: kvm@vger.kernel.org
>> Cc: linux-arm-kernel@lists.infradead.org
>> Acked-by: Paolo Bonzini <pbonzini@redhat.com>
>> Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
>> ---
>>
>>  arch/arm/kvm/arm.c |    7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index bd18bb8..f0e50a0 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -1051,21 +1051,26 @@ int kvm_arch_init(void *opaque)
>>  		}
>>  	}
>>  
>> +	cpu_notifier_register_begin();
>> +
>>  	err = init_hyp_mode();
>>  	if (err)
>>  		goto out_err;
>>  
>> -	err = register_cpu_notifier(&hyp_init_cpu_nb);
>> +	err = __register_cpu_notifier(&hyp_init_cpu_nb);
>>  	if (err) {
>>  		kvm_err("Cannot register HYP init CPU notifier (%d)\n", err);
>>  		goto out_err;
>>  	}
>>  
>> +	cpu_notifier_register_done();
>> +
>>  	hyp_cpu_pm_init();
>>  
>>  	kvm_coproc_table_init();
>>  	return 0;
>>  out_err:
>> +	cpu_notifier_register_done();
>>  	return err;
>>  }
>>  
>>
> 
> Just so we're clear, the existing code was simply racy as not prone to
> deadlocks, right?
> 
> This makes it clear that the test above for compatible CPUs can be quite
> easily evaded by using CPU hotplug, but we don't really have a good
> solution for handling that yet...  Hmmm, grumble grumble, I guess if you
> hotplug unsupported CPUs on a KVM/ARM system for now, stuff will break.
> 

In this particular case, there was no deadlock possibility, rather the
existing code had insufficient synchronization against CPU hotplug.

init_hyp_mode() would invoke cpu_init_hyp_mode() on currently online CPUs
using on_each_cpu(). If a CPU came online after this point and before calling
register_cpu_notifier(), that CPU would remain uninitialized because this
subsystem would miss the hot-online event. This patch fixes this bug and
also uses the new synchronization method (instead of get/put_online_cpus())
to ensure that we don't deadlock with CPU hotplug.

> In any case:
> Acked-by: Christoffer Dall <christoffer.dall@linaro.org>
> 

Thanks a lot!

Regards,
Srivatsa S. Bhat

^ permalink raw reply

* [PATCH 20/20] powerpc/perf: Fix handling of L3 events with bank == 1
From: Michael Ellerman @ 2014-03-14  5:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: cody, khandual
In-Reply-To: <1394773245-18328-1-git-send-email-mpe@ellerman.id.au>

Currently we reject events which have the L3 bank == 1, such as
0x000084918F, because the cache field is non-zero.

However that is incorrect, because although the bank is non-zero, the
value we would write into MMCRC is zero, and so we can count the event.

So fix the check to ignore the bank selector when checking whether the
cache selector is non-zero.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/perf/power8-pmu.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/perf/power8-pmu.c b/arch/powerpc/perf/power8-pmu.c
index 3ad363d..fe2763b 100644
--- a/arch/powerpc/perf/power8-pmu.c
+++ b/arch/powerpc/perf/power8-pmu.c
@@ -325,9 +325,10 @@ static int power8_get_constraint(u64 event, unsigned long *maskp, unsigned long
 		 * HV writable, and there is no API for guest kernels to modify
 		 * it. The solution is for the hypervisor to initialise the
 		 * field to zeroes, and for us to only ever allow events that
-		 * have a cache selector of zero.
+		 * have a cache selector of zero. The bank selector (bit 3) is
+		 * irrelevant, as long as the rest of the value is 0.
 		 */
-		if (cache)
+		if (cache & 0x7)
 			return -1;
 
 	} else if (event & EVENT_IS_L1) {
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 19/20] powerpc/perf/hv_{gpci, 24x7}: Add documentation of device attributes
From: Michael Ellerman @ 2014-03-14  5:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: cody, khandual
In-Reply-To: <1394773245-18328-1-git-send-email-mpe@ellerman.id.au>

From: Cody P Schafer <cody@linux.vnet.ibm.com>

gpci and 24x7 expose some device specific attributes. Add some
documentation for them.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 .../testing/sysfs-bus-event_source-devices-hv_24x7 | 23 ++++++++++++
 .../testing/sysfs-bus-event_source-devices-hv_gpci | 43 ++++++++++++++++++++++
 2 files changed, 66 insertions(+)
 create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
 create mode 100644 Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci

diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7 b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
new file mode 100644
index 0000000..e78ee79
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_24x7
@@ -0,0 +1,23 @@
+What:		/sys/bus/event_source/devices/hv_24x7/interface/catalog
+Date:		February 2014
+Contact:	Cody P Schafer <cody@linux.vnet.ibm.com>
+Description:
+		Provides access to the binary "24x7 catalog" provided by the
+		hypervisor on POWER7 and 8 systems. This catalog lists events
+		avaliable from the powerpc "hv_24x7" pmu. Its format is
+		documented here:
+		https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h
+
+What:		/sys/bus/event_source/devices/hv_24x7/interface/catalog_length
+Date:		February 2014
+Contact:	Cody P Schafer <cody@linux.vnet.ibm.com>
+Description:
+		A number equal to the length in bytes of the catalog. This is
+		also extractable from the provided binary "catalog" sysfs entry.
+
+What:		/sys/bus/event_source/devices/hv_24x7/interface/catalog_version
+Date:		February 2014
+Contact:	Cody P Schafer <cody@linux.vnet.ibm.com>
+Description:
+		Exposes the "version" field of the 24x7 catalog. This is also
+		extractable from the provided binary "catalog" sysfs entry.
diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
new file mode 100644
index 0000000..3fa58c2
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-hv_gpci
@@ -0,0 +1,43 @@
+What:		/sys/bus/event_source/devices/hv_gpci/interface/collect_privileged
+Date:		February 2014
+Contact:	Cody P Schafer <cody@linux.vnet.ibm.com>
+Description:
+		'0' if the hypervisor is configured to forbid access to event
+		counters being accumulated by other guests and to physical
+		domain event counters.
+		'1' if that access is allowed.
+
+What:		/sys/bus/event_source/devices/hv_gpci/interface/ga
+Date:		February 2014
+Contact:	Cody P Schafer <cody@linux.vnet.ibm.com>
+Description:
+		0 or 1. Indicates whether we have access to "GA" events (listed
+		in arch/powerpc/perf/hv-gpci.h).
+
+What:		/sys/bus/event_source/devices/hv_gpci/interface/expanded
+Date:		February 2014
+Contact:	Cody P Schafer <cody@linux.vnet.ibm.com>
+Description:
+		0 or 1. Indicates whether we have access to "EXPANDED" events (listed
+		in arch/powerpc/perf/hv-gpci.h).
+
+What:		/sys/bus/event_source/devices/hv_gpci/interface/lab
+Date:		February 2014
+Contact:	Cody P Schafer <cody@linux.vnet.ibm.com>
+Description:
+		0 or 1. Indicates whether we have access to "LAB" events (listed
+		in arch/powerpc/perf/hv-gpci.h).
+
+What:		/sys/bus/event_source/devices/hv_gpci/interface/version
+Date:		February 2014
+Contact:	Cody P Schafer <cody@linux.vnet.ibm.com>
+Description:
+		A number indicating the version of the gpci interface that the
+		hypervisor reports supporting.
+
+What:		/sys/bus/event_source/devices/hv_gpci/interface/kernel_version
+Date:		February 2014
+Contact:	Cody P Schafer <cody@linux.vnet.ibm.com>
+Description:
+		A number indicating the latest version of the gpci interface
+		that the kernel is aware of.
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 18/20] powerpc/perf: Add kconfig option for hypervisor provided counters
From: Michael Ellerman @ 2014-03-14  5:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: cody, khandual
In-Reply-To: <1394773245-18328-1-git-send-email-mpe@ellerman.id.au>

From: Cody P Schafer <cody@linux.vnet.ibm.com>

The commit adds a Kconfig option which allows the hv_gpci and hv_24x7
PMUs, added in the preceeding commits, to be built.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/perf/Makefile             |  2 ++
 arch/powerpc/platforms/pseries/Kconfig | 12 ++++++++++++
 2 files changed, 14 insertions(+)

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 60d71ee..f9c083a 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS)	+= mpc7450-pmu.o
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
 obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
 
+obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o hv-common.o
+
 obj-$(CONFIG_PPC64)		+= $(obj64-y)
 obj-$(CONFIG_PPC32)		+= $(obj32-y)
diff --git a/arch/powerpc/platforms/pseries/Kconfig b/arch/powerpc/platforms/pseries/Kconfig
index 80b1d57..2cb8b77 100644
--- a/arch/powerpc/platforms/pseries/Kconfig
+++ b/arch/powerpc/platforms/pseries/Kconfig
@@ -111,6 +111,18 @@ config CMM
 	  will be reused for other LPARs. The interface allows firmware to
 	  balance memory across many LPARs.
 
+config HV_PERF_CTRS
+       bool "Hypervisor supplied PMU events (24x7 & GPCI)"
+       default y
+       depends on PERF_EVENTS && PPC_PSERIES
+       help
+	  Enable access to hypervisor supplied counters in perf. Currently,
+	  this enables code that uses the hcall GetPerfCounterInfo and 24x7
+	  interfaces to retrieve counters. GPCI exists on Power 6 and later
+	  systems. 24x7 is available on Power 8 systems.
+
+          If unsure, select Y.
+
 config DTL
 	bool "Dispatch Trace Log"
 	depends on PPC_SPLPAR && DEBUG_FS
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 17/20] powerpc/perf: Add support for the hv 24x7 interface
From: Michael Ellerman @ 2014-03-14  5:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: cody, khandual
In-Reply-To: <1394773245-18328-1-git-send-email-mpe@ellerman.id.au>

From: Cody P Schafer <cody@linux.vnet.ibm.com>

This provides a basic interface between hv_24x7 and perf. Similar to
the one provided for gpci, it lacks transaction support and does not
list any events.

Example usage via perf tool:

	perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0xffffffff/' -r 0 -C 0 -x ' ' sleep 0.1

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/perf/hv-24x7.c | 510 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 510 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-24x7.c

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
new file mode 100644
index 0000000..297c9105
--- /dev/null
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -0,0 +1,510 @@
+/*
+ * Hypervisor supplied "24x7" performance counter support
+ *
+ * Author: Cody P Schafer <cody@linux.vnet.ibm.com>
+ * Copyright 2014 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "hv-24x7: " fmt
+
+#include <linux/perf_event.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/io.h>
+
+#include "hv-24x7.h"
+#include "hv-24x7-catalog.h"
+#include "hv-common.h"
+
+/*
+ * TODO: Merging events:
+ * - Think of the hcall as an interface to a 4d array of counters:
+ *   - x = domains
+ *   - y = indexes in the domain (core, chip, vcpu, node, etc)
+ *   - z = offset into the counter space
+ *   - w = lpars (guest vms, "logical partitions")
+ * - A single request is: x,y,y_last,z,z_last,w,w_last
+ *   - this means we can retrieve a rectangle of counters in y,z for a single x.
+ *
+ * - Things to consider (ignoring w):
+ *   - input  cost_per_request = 16
+ *   - output cost_per_result(ys,zs)  = 8 + 8 * ys + ys * zs
+ *   - limited number of requests per hcall (must fit into 4K bytes)
+ *     - 4k = 16 [buffer header] - 16 [request size] * request_count
+ *     - 255 requests per hcall
+ *   - sometimes it will be more efficient to read extra data and discard
+ */
+
+/*
+ * Example usage:
+ *  perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0xffffffff/'
+ */
+
+/* u3 0-6, one of HV_24X7_PERF_DOMAIN */
+EVENT_DEFINE_RANGE_FORMAT(domain, config, 0, 3);
+/* u16 */
+EVENT_DEFINE_RANGE_FORMAT(starting_index, config, 16, 31);
+/* u32, see "data_offset" */
+EVENT_DEFINE_RANGE_FORMAT(offset, config, 32, 63);
+/* u16 */
+EVENT_DEFINE_RANGE_FORMAT(lpar, config1, 0, 15);
+
+EVENT_DEFINE_RANGE(reserved1, config,   4, 15);
+EVENT_DEFINE_RANGE(reserved2, config1, 16, 63);
+EVENT_DEFINE_RANGE(reserved3, config2,  0, 63);
+
+static struct attribute *format_attrs[] = {
+	&format_attr_domain.attr,
+	&format_attr_offset.attr,
+	&format_attr_starting_index.attr,
+	&format_attr_lpar.attr,
+	NULL,
+};
+
+static struct attribute_group format_group = {
+	.name = "format",
+	.attrs = format_attrs,
+};
+
+static struct kmem_cache *hv_page_cache;
+
+/*
+ * read_offset_data - copy data from one buffer to another while treating the
+ *                    source buffer as a small view on the total avaliable
+ *                    source data.
+ *
+ * @dest: buffer to copy into
+ * @dest_len: length of @dest in bytes
+ * @requested_offset: the offset within the source data we want. Must be > 0
+ * @src: buffer to copy data from
+ * @src_len: length of @src in bytes
+ * @source_offset: the offset in the sorce data that (src,src_len) refers to.
+ *                 Must be > 0
+ *
+ * returns the number of bytes copied.
+ *
+ * The following ascii art shows the various buffer possitioning we need to
+ * handle, assigns some arbitrary varibles to points on the buffer, and then
+ * shows how we fiddle with those values to get things we care about (copy
+ * start in src and copy len)
+ *
+ * s = @src buffer
+ * d = @dest buffer
+ * '.' areas in d are written to.
+ *
+ *                       u
+ *   x         w	 v  z
+ * d           |.........|
+ * s |----------------------|
+ *
+ *                      u
+ *   x         w	z     v
+ * d           |........------|
+ * s |------------------|
+ *
+ *   x         w        u,z,v
+ * d           |........|
+ * s |------------------|
+ *
+ *   x,w                u,v,z
+ * d |..................|
+ * s |------------------|
+ *
+ *   x        u
+ *   w        v		z
+ * d |........|
+ * s |------------------|
+ *
+ *   x      z   w      v
+ * d            |------|
+ * s |------|
+ *
+ * x = source_offset
+ * w = requested_offset
+ * z = source_offset + src_len
+ * v = requested_offset + dest_len
+ *
+ * w_offset_in_s = w - x = requested_offset - source_offset
+ * z_offset_in_s = z - x = src_len
+ * v_offset_in_s = v - x = request_offset + dest_len - src_len
+ */
+static ssize_t read_offset_data(void *dest, size_t dest_len,
+				loff_t requested_offset, void *src,
+				size_t src_len, loff_t source_offset)
+{
+	size_t w_offset_in_s = requested_offset - source_offset;
+	size_t z_offset_in_s = src_len;
+	size_t v_offset_in_s = requested_offset + dest_len - src_len;
+	size_t u_offset_in_s = min(z_offset_in_s, v_offset_in_s);
+	size_t copy_len = u_offset_in_s - w_offset_in_s;
+
+	if (requested_offset < 0 || source_offset < 0)
+		return -EINVAL;
+
+	if (z_offset_in_s <= w_offset_in_s)
+		return 0;
+
+	memcpy(dest, src + w_offset_in_s, copy_len);
+	return copy_len;
+}
+
+static unsigned long h_get_24x7_catalog_page(char page[static 4096],
+					     u32 version, u32 index)
+{
+	WARN_ON(!IS_ALIGNED((unsigned long)page, 4096));
+	return plpar_hcall_norets(H_GET_24X7_CATALOG_PAGE,
+			virt_to_phys(page),
+			version,
+			index);
+}
+
+static ssize_t catalog_read(struct file *filp, struct kobject *kobj,
+			    struct bin_attribute *bin_attr, char *buf,
+			    loff_t offset, size_t count)
+{
+	unsigned long hret;
+	ssize_t ret = 0;
+	size_t catalog_len = 0, catalog_page_len = 0, page_count = 0;
+	loff_t page_offset = 0;
+	uint32_t catalog_version_num = 0;
+	void *page = kmem_cache_alloc(hv_page_cache, GFP_USER);
+	struct hv_24x7_catalog_page_0 *page_0 = page;
+	if (!page)
+		return -ENOMEM;
+
+	hret = h_get_24x7_catalog_page(page, 0, 0);
+	if (hret) {
+		ret = -EIO;
+		goto e_free;
+	}
+
+	catalog_version_num = be32_to_cpu(page_0->version);
+	catalog_page_len = be32_to_cpu(page_0->length);
+	catalog_len = catalog_page_len * 4096;
+
+	page_offset = offset / 4096;
+	page_count  = count  / 4096;
+
+	if (page_offset >= catalog_page_len)
+		goto e_free;
+
+	if (page_offset != 0) {
+		hret = h_get_24x7_catalog_page(page, catalog_version_num,
+					       page_offset);
+		if (hret) {
+			ret = -EIO;
+			goto e_free;
+		}
+	}
+
+	ret = read_offset_data(buf, count, offset,
+				page, 4096, page_offset * 4096);
+e_free:
+	if (hret)
+		pr_err("h_get_24x7_catalog_page(ver=%d, page=%lld) failed: rc=%ld\n",
+				catalog_version_num, page_offset, hret);
+	kfree(page);
+
+	pr_devel("catalog_read: offset=%lld(%lld) count=%zu(%zu) catalog_len=%zu(%zu) => %zd\n",
+			offset, page_offset, count, page_count, catalog_len,
+			catalog_page_len, ret);
+
+	return ret;
+}
+
+#define PAGE_0_ATTR(_name, _fmt, _expr)				\
+static ssize_t _name##_show(struct device *dev,			\
+			    struct device_attribute *dev_attr,	\
+			    char *buf)				\
+{								\
+	unsigned long hret;					\
+	ssize_t ret = 0;					\
+	void *page = kmem_cache_alloc(hv_page_cache, GFP_USER);	\
+	struct hv_24x7_catalog_page_0 *page_0 = page;		\
+	if (!page)						\
+		return -ENOMEM;					\
+	hret = h_get_24x7_catalog_page(page, 0, 0);		\
+	if (hret) {						\
+		ret = -EIO;					\
+		goto e_free;					\
+	}							\
+	ret = sprintf(buf, _fmt, _expr);			\
+e_free:								\
+	kfree(page);						\
+	return ret;						\
+}								\
+static DEVICE_ATTR_RO(_name)
+
+PAGE_0_ATTR(catalog_version, "%lld\n",
+		(unsigned long long)be32_to_cpu(page_0->version));
+PAGE_0_ATTR(catalog_len, "%lld\n",
+		(unsigned long long)be32_to_cpu(page_0->length) * 4096);
+static BIN_ATTR_RO(catalog, 0/* real length varies */);
+
+static struct bin_attribute *if_bin_attrs[] = {
+	&bin_attr_catalog,
+	NULL,
+};
+
+static struct attribute *if_attrs[] = {
+	&dev_attr_catalog_len.attr,
+	&dev_attr_catalog_version.attr,
+	NULL,
+};
+
+static struct attribute_group if_group = {
+	.name = "interface",
+	.bin_attrs = if_bin_attrs,
+	.attrs = if_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+	&format_group,
+	&if_group,
+	NULL,
+};
+
+static bool is_physical_domain(int domain)
+{
+	return  domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP ||
+		domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CORE;
+}
+
+static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix,
+					 u16 lpar, u64 *res,
+					 bool success_expected)
+{
+	unsigned long ret;
+
+	/*
+	 * request_buffer and result_buffer are not required to be 4k aligned,
+	 * but are not allowed to cross any 4k boundary. Aligning them to 4k is
+	 * the simplest way to ensure that.
+	 */
+	struct reqb {
+		struct hv_24x7_request_buffer buf;
+		struct hv_24x7_request req;
+	} __packed __aligned(4096) request_buffer = {
+		.buf = {
+			.interface_version = HV_24X7_IF_VERSION_CURRENT,
+			.num_requests = 1,
+		},
+		.req = {
+			.performance_domain = domain,
+			.data_size = cpu_to_be16(8),
+			.data_offset = cpu_to_be32(offset),
+			.starting_lpar_ix = cpu_to_be16(lpar),
+			.max_num_lpars = cpu_to_be16(1),
+			.starting_ix = cpu_to_be16(ix),
+			.max_ix = cpu_to_be16(1),
+		}
+	};
+
+	struct resb {
+		struct hv_24x7_data_result_buffer buf;
+		struct hv_24x7_result res;
+		struct hv_24x7_result_element elem;
+		__be64 result;
+	} __packed __aligned(4096) result_buffer = {};
+
+	ret = plpar_hcall_norets(H_GET_24X7_DATA,
+			virt_to_phys(&request_buffer), sizeof(request_buffer),
+			virt_to_phys(&result_buffer),  sizeof(result_buffer));
+
+	if (ret) {
+		if (success_expected)
+			pr_err_ratelimited("hcall failed: %d %#x %#x %d => 0x%lx (%ld) detail=0x%x failing ix=%x\n",
+					domain, offset, ix, lpar,
+					ret, ret,
+					result_buffer.buf.detailed_rc,
+					result_buffer.buf.failing_request_ix);
+		return ret;
+	}
+
+	*res = be64_to_cpu(result_buffer.result);
+	return ret;
+}
+
+static unsigned long event_24x7_request(struct perf_event *event, u64 *res,
+		bool success_expected)
+{
+	return single_24x7_request(event_get_domain(event),
+				event_get_offset(event),
+				event_get_starting_index(event),
+				event_get_lpar(event),
+				res,
+				success_expected);
+}
+
+static int h_24x7_event_init(struct perf_event *event)
+{
+	struct hv_perf_caps caps;
+	unsigned domain;
+	unsigned long hret;
+	u64 ct;
+
+	/* Not our event */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* Unused areas must be 0 */
+	if (event_get_reserved1(event) ||
+	    event_get_reserved2(event) ||
+	    event_get_reserved3(event)) {
+		pr_devel("reserved set when forbidden 0x%llx(0x%llx) 0x%llx(0x%llx) 0x%llx(0x%llx)\n",
+				event->attr.config,
+				event_get_reserved1(event),
+				event->attr.config1,
+				event_get_reserved2(event),
+				event->attr.config2,
+				event_get_reserved3(event));
+		return -EINVAL;
+	}
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest  ||
+	    is_sampling_event(event)) /* no sampling */
+		return -EINVAL;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	/* offset must be 8 byte aligned */
+	if (event_get_offset(event) % 8) {
+		pr_devel("bad alignment\n");
+		return -EINVAL;
+	}
+
+	/* Domains above 6 are invalid */
+	domain = event_get_domain(event);
+	if (domain > 6) {
+		pr_devel("invalid domain %d\n", domain);
+		return -EINVAL;
+	}
+
+	hret = hv_perf_caps_get(&caps);
+	if (hret) {
+		pr_devel("could not get capabilities: rc=%ld\n", hret);
+		return -EIO;
+	}
+
+	/* PHYSICAL domains & other lpars require extra capabilities */
+	if (!caps.collect_privileged && (is_physical_domain(domain) ||
+		(event_get_lpar(event) != event_get_lpar_max()))) {
+		pr_devel("hv permisions disallow: is_physical_domain:%d, lpar=0x%llx\n",
+				is_physical_domain(domain),
+				event_get_lpar(event));
+		return -EACCES;
+	}
+
+	/* see if the event complains */
+	if (event_24x7_request(event, &ct, false)) {
+		pr_devel("test hcall failed\n");
+		return -EIO;
+	}
+
+	return 0;
+}
+
+static u64 h_24x7_get_value(struct perf_event *event)
+{
+	unsigned long ret;
+	u64 ct;
+	ret = event_24x7_request(event, &ct, true);
+	if (ret)
+		/* We checked this in event init, shouldn't fail here... */
+		return 0;
+
+	return ct;
+}
+
+static void h_24x7_event_update(struct perf_event *event)
+{
+	s64 prev;
+	u64 now;
+	now = h_24x7_get_value(event);
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void h_24x7_event_start(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_RELOAD)
+		local64_set(&event->hw.prev_count, h_24x7_get_value(event));
+}
+
+static void h_24x7_event_stop(struct perf_event *event, int flags)
+{
+	h_24x7_event_update(event);
+}
+
+static int h_24x7_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		h_24x7_event_start(event, flags);
+
+	return 0;
+}
+
+static int h_24x7_event_idx(struct perf_event *event)
+{
+	return 0;
+}
+
+static struct pmu h_24x7_pmu = {
+	.task_ctx_nr = perf_invalid_context,
+
+	.name = "hv_24x7",
+	.attr_groups = attr_groups,
+	.event_init  = h_24x7_event_init,
+	.add         = h_24x7_event_add,
+	.del         = h_24x7_event_stop,
+	.start       = h_24x7_event_start,
+	.stop        = h_24x7_event_stop,
+	.read        = h_24x7_event_update,
+	.event_idx   = h_24x7_event_idx,
+};
+
+static int hv_24x7_init(void)
+{
+	int r;
+	unsigned long hret;
+	struct hv_perf_caps caps;
+
+	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+		pr_info("not a virtualized system, not enabling\n");
+		return -ENODEV;
+	}
+
+	hret = hv_perf_caps_get(&caps);
+	if (hret) {
+		pr_info("could not obtain capabilities, error 0x%80lx, not enabling\n",
+				hret);
+		return -ENODEV;
+	}
+
+	hv_page_cache = kmem_cache_create("hv-page-4096", 4096, 4096, 0, NULL);
+	if (!hv_page_cache)
+		return -ENOMEM;
+
+	r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+device_initcall(hv_24x7_init);
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 16/20] powerpc/perf: Add support for the hv gpci (get performance counter info) interface
From: Michael Ellerman @ 2014-03-14  5:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: cody, khandual
In-Reply-To: <1394773245-18328-1-git-send-email-mpe@ellerman.id.au>

From: Cody P Schafer <cody@linux.vnet.ibm.com>

This provides a basic link between perf and hv_gpci. Notably, it does
not yet support transactions and does not list any events (they can
still be manually composed).

Example usage via perf tool:

	perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0xffffffff,request=0x10/' -r 0 -C 0 -x ' ' sleep 0.1

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/perf/hv-gpci.c | 294 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 294 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-gpci.c

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
new file mode 100644
index 0000000..278ba7b
--- /dev/null
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -0,0 +1,294 @@
+/*
+ * Hypervisor supplied "gpci" ("get performance counter info") performance
+ * counter support
+ *
+ * Author: Cody P Schafer <cody@linux.vnet.ibm.com>
+ * Copyright 2014 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "hv-gpci: " fmt
+
+#include <linux/init.h>
+#include <linux/perf_event.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/io.h>
+
+#include "hv-gpci.h"
+#include "hv-common.h"
+
+/*
+ * Example usage:
+ *  perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,
+ *		  secondary_index=0,starting_index=0xffffffff,request=0x10/' ...
+ */
+
+/* u32 */
+EVENT_DEFINE_RANGE_FORMAT(request, config, 0, 31);
+/* u32 */
+EVENT_DEFINE_RANGE_FORMAT(starting_index, config, 32, 63);
+/* u16 */
+EVENT_DEFINE_RANGE_FORMAT(secondary_index, config1, 0, 15);
+/* u8 */
+EVENT_DEFINE_RANGE_FORMAT(counter_info_version, config1, 16, 23);
+/* u8, bytes of data (1-8) */
+EVENT_DEFINE_RANGE_FORMAT(length, config1, 24, 31);
+/* u32, byte offset */
+EVENT_DEFINE_RANGE_FORMAT(offset, config1, 32, 63);
+
+static struct attribute *format_attrs[] = {
+	&format_attr_request.attr,
+	&format_attr_starting_index.attr,
+	&format_attr_secondary_index.attr,
+	&format_attr_counter_info_version.attr,
+
+	&format_attr_offset.attr,
+	&format_attr_length.attr,
+	NULL,
+};
+
+static struct attribute_group format_group = {
+	.name = "format",
+	.attrs = format_attrs,
+};
+
+#define HV_CAPS_ATTR(_name, _format)				\
+static ssize_t _name##_show(struct device *dev,			\
+			    struct device_attribute *attr,	\
+			    char *page)				\
+{								\
+	struct hv_perf_caps caps;				\
+	unsigned long hret = hv_perf_caps_get(&caps);		\
+	if (hret)						\
+		return -EIO;					\
+								\
+	return sprintf(page, _format, caps._name);		\
+}								\
+static struct device_attribute hv_caps_attr_##_name = __ATTR_RO(_name)
+
+static ssize_t kernel_version_show(struct device *dev,
+				   struct device_attribute *attr,
+				   char *page)
+{
+	return sprintf(page, "0x%x\n", COUNTER_INFO_VERSION_CURRENT);
+}
+
+DEVICE_ATTR_RO(kernel_version);
+HV_CAPS_ATTR(version, "0x%x\n");
+HV_CAPS_ATTR(ga, "%d\n");
+HV_CAPS_ATTR(expanded, "%d\n");
+HV_CAPS_ATTR(lab, "%d\n");
+HV_CAPS_ATTR(collect_privileged, "%d\n");
+
+static struct attribute *interface_attrs[] = {
+	&dev_attr_kernel_version.attr,
+	&hv_caps_attr_version.attr,
+	&hv_caps_attr_ga.attr,
+	&hv_caps_attr_expanded.attr,
+	&hv_caps_attr_lab.attr,
+	&hv_caps_attr_collect_privileged.attr,
+	NULL,
+};
+
+static struct attribute_group interface_group = {
+	.name = "interface",
+	.attrs = interface_attrs,
+};
+
+static const struct attribute_group *attr_groups[] = {
+	&format_group,
+	&interface_group,
+	NULL,
+};
+
+#define GPCI_MAX_DATA_BYTES \
+	(1024 - sizeof(struct hv_get_perf_counter_info_params))
+
+static unsigned long single_gpci_request(u32 req, u32 starting_index,
+		u16 secondary_index, u8 version_in, u32 offset, u8 length,
+		u64 *value)
+{
+	unsigned long ret;
+	size_t i;
+	u64 count;
+
+	struct {
+		struct hv_get_perf_counter_info_params params;
+		uint8_t bytes[GPCI_MAX_DATA_BYTES];
+	} __packed __aligned(sizeof(uint64_t)) arg = {
+		.params = {
+			.counter_request = cpu_to_be32(req),
+			.starting_index = cpu_to_be32(starting_index),
+			.secondary_index = cpu_to_be16(secondary_index),
+			.counter_info_version_in = version_in,
+		}
+	};
+
+	ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+			virt_to_phys(&arg), sizeof(arg));
+	if (ret) {
+		pr_devel("hcall failed: 0x%lx\n", ret);
+		return ret;
+	}
+
+	/*
+	 * we verify offset and length are within the zeroed buffer at event
+	 * init.
+	 */
+	count = 0;
+	for (i = offset; i < offset + length; i++)
+		count |= arg.bytes[i] << (i - offset);
+
+	*value = count;
+	return ret;
+}
+
+static u64 h_gpci_get_value(struct perf_event *event)
+{
+	u64 count;
+	unsigned long ret = single_gpci_request(event_get_request(event),
+					event_get_starting_index(event),
+					event_get_secondary_index(event),
+					event_get_counter_info_version(event),
+					event_get_offset(event),
+					event_get_length(event),
+					&count);
+	if (ret)
+		return 0;
+	return count;
+}
+
+static void h_gpci_event_update(struct perf_event *event)
+{
+	s64 prev;
+	u64 now = h_gpci_get_value(event);
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void h_gpci_event_start(struct perf_event *event, int flags)
+{
+	local64_set(&event->hw.prev_count, h_gpci_get_value(event));
+}
+
+static void h_gpci_event_stop(struct perf_event *event, int flags)
+{
+	h_gpci_event_update(event);
+}
+
+static int h_gpci_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		h_gpci_event_start(event, flags);
+
+	return 0;
+}
+
+static int h_gpci_event_init(struct perf_event *event)
+{
+	u64 count;
+	u8 length;
+
+	/* Not our event */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* config2 is unused */
+	if (event->attr.config2) {
+		pr_devel("config2 set when reserved\n");
+		return -EINVAL;
+	}
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest  ||
+	    is_sampling_event(event)) /* no sampling */
+		return -EINVAL;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	length = event_get_length(event);
+	if (length < 1 || length > 8) {
+		pr_devel("length invalid\n");
+		return -EINVAL;
+	}
+
+	/* last byte within the buffer? */
+	if ((event_get_offset(event) + length) > GPCI_MAX_DATA_BYTES) {
+		pr_devel("request outside of buffer: %zu > %zu\n",
+				(size_t)event_get_offset(event) + length,
+				GPCI_MAX_DATA_BYTES);
+		return -EINVAL;
+	}
+
+	/* check if the request works... */
+	if (single_gpci_request(event_get_request(event),
+				event_get_starting_index(event),
+				event_get_secondary_index(event),
+				event_get_counter_info_version(event),
+				event_get_offset(event),
+				length,
+				&count)) {
+		pr_devel("gpci hcall failed\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static int h_gpci_event_idx(struct perf_event *event)
+{
+	return 0;
+}
+
+static struct pmu h_gpci_pmu = {
+	.task_ctx_nr = perf_invalid_context,
+
+	.name = "hv_gpci",
+	.attr_groups = attr_groups,
+	.event_init  = h_gpci_event_init,
+	.add         = h_gpci_event_add,
+	.del         = h_gpci_event_stop,
+	.start       = h_gpci_event_start,
+	.stop        = h_gpci_event_stop,
+	.read        = h_gpci_event_update,
+	.event_idx   = h_gpci_event_idx,
+};
+
+static int hv_gpci_init(void)
+{
+	int r;
+	unsigned long hret;
+	struct hv_perf_caps caps;
+
+	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+		pr_info("not a virtualized system, not enabling\n");
+		return -ENODEV;
+	}
+
+	hret = hv_perf_caps_get(&caps);
+	if (hret) {
+		pr_info("could not obtain capabilities, error 0x%80lx, not enabling\n",
+				hret);
+		return -ENODEV;
+	}
+
+	r = perf_pmu_register(&h_gpci_pmu, h_gpci_pmu.name, -1);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+device_initcall(hv_gpci_init);
-- 
1.8.3.2

^ permalink raw reply related

* [PATCH 15/20] powerpc/perf: Add macros for defining event fields & formats
From: Michael Ellerman @ 2014-03-14  5:00 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: cody, khandual
In-Reply-To: <1394773245-18328-1-git-send-email-mpe@ellerman.id.au>

From: Cody P Schafer <cody@linux.vnet.ibm.com>

Add two macros which generate functions to extract the relevent bits
from event->attr.config{,1,2}.

EVENT_DEFINE_RANGE() defines an accessor for a range of bits in the
event, as well as a "max" function that gives the maximum value of the
field based on the bit width.

EVENT_DEFINE_RANGE_FORMAT() defines the accessor & max routine and also
a format attribute for use in the PMU's attr_groups.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
[mpe: move to powerpc, ugly but descriptive macro names]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
---
 arch/powerpc/perf/hv-common.h | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/arch/powerpc/perf/hv-common.h b/arch/powerpc/perf/hv-common.h
index 7e615bd..5d79cec 100644
--- a/arch/powerpc/perf/hv-common.h
+++ b/arch/powerpc/perf/hv-common.h
@@ -1,6 +1,7 @@
 #ifndef LINUX_POWERPC_PERF_HV_COMMON_H_
 #define LINUX_POWERPC_PERF_HV_COMMON_H_
 
+#include <linux/perf_event.h>
 #include <linux/types.h>
 
 struct hv_perf_caps {
@@ -14,4 +15,22 @@ struct hv_perf_caps {
 
 unsigned long hv_perf_caps_get(struct hv_perf_caps *caps);
 
+
+#define EVENT_DEFINE_RANGE_FORMAT(name, attr_var, bit_start, bit_end)	\
+PMU_FORMAT_ATTR(name, #attr_var ":" #bit_start "-" #bit_end);		\
+EVENT_DEFINE_RANGE(name, attr_var, bit_start, bit_end)
+
+#define EVENT_DEFINE_RANGE(name, attr_var, bit_start, bit_end)	\
+static u64 event_get_##name##_max(void)					\
+{									\
+	BUILD_BUG_ON((bit_start > bit_end)				\
+		    || (bit_end >= (sizeof(1ull) * 8)));		\
+	return (((1ull << (bit_end - bit_start)) - 1) << 1) + 1;	\
+}									\
+static u64 event_get_##name(struct perf_event *event)			\
+{									\
+	return (event->attr.attr_var >> (bit_start)) &			\
+		event_get_##name##_max();				\
+}
+
 #endif
-- 
1.8.3.2

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox