LinuxPPC-Dev Archive on lore.kernel.org

LinuxPPC-Dev Archive on lore.kernel.org
 help / color / mirror / Atom feed

* Re: [RFC PATCH powerpc] Fix compile error of pgtable-ppc64.h
From: Aneesh Kumar K.V @ 2014-01-17  6:57 UTC (permalink / raw)
  To: Li Zhong, PowerPC email list; +Cc: Paul Mackerras
In-Reply-To: <1389939036.3000.7.camel@ThinkPad-T5421.cn.ibm.com>

Li Zhong <zhong@linux.vnet.ibm.com> writes:

> It seems that forward declaration couldn't work well with typedef, use
> struct spinlock directly to avoiding following build errors:
>
> In file included from include/linux/spinlock.h:81,
>                  from include/linux/seqlock.h:35,
>                  from include/linux/time.h:5,
>                  from include/uapi/linux/timex.h:56,
>                  from include/linux/timex.h:56,
>                  from include/linux/sched.h:17,
>                  from arch/powerpc/kernel/asm-offsets.c:17:
> include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
> /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here
>

what compiler version ? I have seen that error in gcc 4.3 and it was
concluded that it is too old a compiler version to worry about. That
specific compiler version also gave error for forward declaring struct;

> Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/pgtable-ppc64.h |    6 +++---
>  1 files changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
> index d27960c..bc141c9 100644
> --- a/arch/powerpc/include/asm/pgtable-ppc64.h
> +++ b/arch/powerpc/include/asm/pgtable-ppc64.h
> @@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
>  			    pmd_t *pmdp);
>
>  #define pmd_move_must_withdraw pmd_move_must_withdraw
> -typedef struct spinlock spinlock_t;
> -static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
> -					 spinlock_t *old_pmd_ptl)
> +struct spinlock;
> +static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
> +					 struct spinlock *old_pmd_ptl)
>  {
>  	/*
>  	 * Archs like ppc64 use pgtable to store per pmd

^ permalink raw reply

* [RFC PATCH powerpc] Fix compile error of pgtable-ppc64.h
From: Li Zhong @ 2014-01-17  6:10 UTC (permalink / raw)
  To: PowerPC email list; +Cc: Paul Mackerras, aneesh.kumar

It seems that forward declaration couldn't work well with typedef, use
struct spinlock directly to avoiding following build errors:

In file included from include/linux/spinlock.h:81,
                 from include/linux/seqlock.h:35,
                 from include/linux/time.h:5,
                 from include/uapi/linux/timex.h:56,
                 from include/linux/timex.h:56,
                 from include/linux/sched.h:17,
                 from arch/powerpc/kernel/asm-offsets.c:17:
include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
/root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/pgtable-ppc64.h |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/pgtable-ppc64.h b/arch/powerpc/include/asm/pgtable-ppc64.h
index d27960c..bc141c9 100644
--- a/arch/powerpc/include/asm/pgtable-ppc64.h
+++ b/arch/powerpc/include/asm/pgtable-ppc64.h
@@ -560,9 +560,9 @@ extern void pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
 			    pmd_t *pmdp);
 
 #define pmd_move_must_withdraw pmd_move_must_withdraw
-typedef struct spinlock spinlock_t;
-static inline int pmd_move_must_withdraw(spinlock_t *new_pmd_ptl,
-					 spinlock_t *old_pmd_ptl)
+struct spinlock;
+static inline int pmd_move_must_withdraw(struct spinlock *new_pmd_ptl,
+					 struct spinlock *old_pmd_ptl)
 {
 	/*
 	 * Archs like ppc64 use pgtable to store per pmd

^ permalink raw reply related

* [PATCH] powerpc/configs: Enbale Freescale IFC controller
From: Prabhakar Kushwaha @ 2014-01-17  6:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: scottwood, Prabhakar Kushwaha

Currently IFC NAND driver is enabled in corenet32smp_defconfig. But IFC
controller is not enabled

So, Enable IFC controller in corenet32smp_defconfig.

Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
 Based upon git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git
 branch master

 arch/powerpc/configs/corenet32_smp_defconfig |    1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/powerpc/configs/corenet32_smp_defconfig b/arch/powerpc/configs/corenet32_smp_defconfig
index 087d437..9578cbe9 100644
--- a/arch/powerpc/configs/corenet32_smp_defconfig
+++ b/arch/powerpc/configs/corenet32_smp_defconfig
@@ -25,6 +25,7 @@ CONFIG_PARTITION_ADVANCED=y
 CONFIG_MAC_PARTITION=y
 CONFIG_CORENET_GENERIC=y
 CONFIG_HIGHMEM=y
+CONFIG_FSL_IFC=y
 # CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS is not set
 CONFIG_BINFMT_MISC=m
 CONFIG_KEXEC=y
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 2/2][v4] powerpc/config: Enable memory driver
From: Prabhakar Kushwaha @ 2014-01-17  5:45 UTC (permalink / raw)
  To: arnd, gregkh; +Cc: scottwood, linuxppc-dev, linux-kernel, Prabhakar Kushwaha

As Freescale IFC controller has been moved to driver to driver/memory.

So enable memory driver in powerpc config

Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
---
 Changes for v2: Sending as it is
 Changes for v3: Sending as it is
 Changes for v4: Rebased to 
	git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git


 arch/powerpc/configs/corenet32_smp_defconfig |    1 +
 arch/powerpc/configs/corenet64_smp_defconfig |    1 +
 arch/powerpc/configs/mpc85xx_defconfig       |    1 +
 arch/powerpc/configs/mpc85xx_smp_defconfig   |    1 +
 4 files changed, 4 insertions(+)

diff --git a/arch/powerpc/configs/corenet32_smp_defconfig b/arch/powerpc/configs/corenet32_smp_defconfig
index 60027c2..b717e5d 100644
--- a/arch/powerpc/configs/corenet32_smp_defconfig
+++ b/arch/powerpc/configs/corenet32_smp_defconfig
@@ -144,6 +144,7 @@ CONFIG_RTC_DRV_DS3232=y
 CONFIG_RTC_DRV_CMOS=y
 CONFIG_UIO=y
 CONFIG_STAGING=y
+CONFIG_MEMORY=y
 CONFIG_VIRT_DRIVERS=y
 CONFIG_FSL_HV_MANAGER=y
 CONFIG_EXT2_FS=y
diff --git a/arch/powerpc/configs/corenet64_smp_defconfig b/arch/powerpc/configs/corenet64_smp_defconfig
index 6c8b020..efbe5a3 100644
--- a/arch/powerpc/configs/corenet64_smp_defconfig
+++ b/arch/powerpc/configs/corenet64_smp_defconfig
@@ -129,6 +129,7 @@ CONFIG_EDAC=y
 CONFIG_EDAC_MM_EDAC=y
 CONFIG_DMADEVICES=y
 CONFIG_FSL_DMA=y
+CONFIG_MEMORY=y
 CONFIG_EXT2_FS=y
 CONFIG_EXT3_FS=y
 CONFIG_ISO9660_FS=m
diff --git a/arch/powerpc/configs/mpc85xx_defconfig b/arch/powerpc/configs/mpc85xx_defconfig
index 5a58882..e215d4d 100644
--- a/arch/powerpc/configs/mpc85xx_defconfig
+++ b/arch/powerpc/configs/mpc85xx_defconfig
@@ -210,6 +210,7 @@ CONFIG_RTC_CLASS=y
 CONFIG_RTC_DRV_CMOS=y
 CONFIG_DMADEVICES=y
 CONFIG_FSL_DMA=y
+CONFIG_MEMORY=y
 # CONFIG_NET_DMA is not set
 CONFIG_EXT2_FS=y
 CONFIG_EXT3_FS=y
diff --git a/arch/powerpc/configs/mpc85xx_smp_defconfig b/arch/powerpc/configs/mpc85xx_smp_defconfig
index 165e6b3..7bc167c 100644
--- a/arch/powerpc/configs/mpc85xx_smp_defconfig
+++ b/arch/powerpc/configs/mpc85xx_smp_defconfig
@@ -210,6 +210,7 @@ CONFIG_RTC_CLASS=y
 CONFIG_RTC_DRV_CMOS=y
 CONFIG_DMADEVICES=y
 CONFIG_FSL_DMA=y
+CONFIG_MEMORY=y
 # CONFIG_NET_DMA is not set
 CONFIG_EXT2_FS=y
 CONFIG_EXT3_FS=y
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 1/2][4] driver/memory:Move Freescale IFC driver to a common driver
From: Prabhakar Kushwaha @ 2014-01-17  5:45 UTC (permalink / raw)
  To: arnd, gregkh; +Cc: scottwood, linuxppc-dev, linux-kernel, Prabhakar Kushwaha

 Freescale IFC controller has been used for mpc8xxx. It will be used
 for ARM-based SoC as well. This patch moves the driver to driver/memory
 and fix the header file includes.

  Also remove module_platform_driver() and  instead call
  platform_driver_register() from subsys_initcall() to make sure this module
  has been loaded before MTD partition parsing starts.

Signed-off-by: Prabhakar Kushwaha <prabhakar@freescale.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
---
Changes for v2:
	- Move fsl_ifc in driver/memory

Changes for v3:
	- move device tree bindings to memory

Changes for v4: Rebased to 
	git://git.kernel.org/pub/scm/linux/kernel/git/scottwood/linux.git

 .../{powerpc => memory-controllers}/fsl/ifc.txt    |    0
 arch/powerpc/sysdev/Makefile                       |    1 -
 drivers/memory/Makefile                            |    1 +
 {arch/powerpc/sysdev => drivers/memory}/fsl_ifc.c  |    8 ++++++--
 drivers/mtd/nand/fsl_ifc_nand.c                    |    2 +-
 .../include/asm => include/linux}/fsl_ifc.h        |    0
 6 files changed, 8 insertions(+), 4 deletions(-)
 rename Documentation/devicetree/bindings/{powerpc => memory-controllers}/fsl/ifc.txt (100%)
 rename {arch/powerpc/sysdev => drivers/memory}/fsl_ifc.c (98%)
 rename {arch/powerpc/include/asm => include/linux}/fsl_ifc.h (100%)

diff --git a/Documentation/devicetree/bindings/powerpc/fsl/ifc.txt b/Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
similarity index 100%
rename from Documentation/devicetree/bindings/powerpc/fsl/ifc.txt
rename to Documentation/devicetree/bindings/memory-controllers/fsl/ifc.txt
diff --git a/arch/powerpc/sysdev/Makefile b/arch/powerpc/sysdev/Makefile
index 99464a7..ec69532 100644
--- a/arch/powerpc/sysdev/Makefile
+++ b/arch/powerpc/sysdev/Makefile
@@ -19,7 +19,6 @@ obj-$(CONFIG_FSL_SOC)		+= fsl_soc.o fsl_mpic_err.o
 obj-$(CONFIG_FSL_PCI)		+= fsl_pci.o $(fsl-msi-obj-y)
 obj-$(CONFIG_FSL_PMC)		+= fsl_pmc.o
 obj-$(CONFIG_FSL_LBC)		+= fsl_lbc.o
-obj-$(CONFIG_FSL_IFC)		+= fsl_ifc.o
 obj-$(CONFIG_FSL_GTM)		+= fsl_gtm.o
 obj-$(CONFIG_FSL_85XX_CACHE_SRAM)	+= fsl_85xx_l2ctlr.o fsl_85xx_cache_sram.o
 obj-$(CONFIG_SIMPLE_GPIO)	+= simple_gpio.o
diff --git a/drivers/memory/Makefile b/drivers/memory/Makefile
index 9cce5d7..b494e5b 100644
--- a/drivers/memory/Makefile
+++ b/drivers/memory/Makefile
@@ -6,5 +6,6 @@ ifeq ($(CONFIG_DDR),y)
 obj-$(CONFIG_OF)		+= of_memory.o
 endif
 obj-$(CONFIG_TI_EMIF)		+= emif.o
+obj-$(CONFIG_FSL_IFC)		+= fsl_ifc.o
 obj-$(CONFIG_TEGRA20_MC)	+= tegra20-mc.o
 obj-$(CONFIG_TEGRA30_MC)	+= tegra30-mc.o
diff --git a/arch/powerpc/sysdev/fsl_ifc.c b/drivers/memory/fsl_ifc.c
similarity index 98%
rename from arch/powerpc/sysdev/fsl_ifc.c
rename to drivers/memory/fsl_ifc.c
index d7fc722..135a950 100644
--- a/arch/powerpc/sysdev/fsl_ifc.c
+++ b/drivers/memory/fsl_ifc.c
@@ -30,8 +30,8 @@
 #include <linux/of.h>
 #include <linux/of_device.h>
 #include <linux/platform_device.h>
+#include <linux/fsl_ifc.h>
 #include <asm/prom.h>
-#include <asm/fsl_ifc.h>
 
 struct fsl_ifc_ctrl *fsl_ifc_ctrl_dev;
 EXPORT_SYMBOL(fsl_ifc_ctrl_dev);
@@ -299,7 +299,11 @@ static struct platform_driver fsl_ifc_ctrl_driver = {
 	.remove      = fsl_ifc_ctrl_remove,
 };
 
-module_platform_driver(fsl_ifc_ctrl_driver);
+static int __init fsl_ifc_init(void)
+{
+	return platform_driver_register(&fsl_ifc_ctrl_driver);
+}
+subsys_initcall(fsl_ifc_init);
 
 MODULE_LICENSE("GPL");
 MODULE_AUTHOR("Freescale Semiconductor");
diff --git a/drivers/mtd/nand/fsl_ifc_nand.c b/drivers/mtd/nand/fsl_ifc_nand.c
index f1f7f12..43d1a92 100644
--- a/drivers/mtd/nand/fsl_ifc_nand.c
+++ b/drivers/mtd/nand/fsl_ifc_nand.c
@@ -29,7 +29,7 @@
 #include <linux/mtd/nand.h>
 #include <linux/mtd/partitions.h>
 #include <linux/mtd/nand_ecc.h>
-#include <asm/fsl_ifc.h>
+#include <linux/fsl_ifc.h>
 
 #define FSL_IFC_V1_1_0	0x01010000
 #define ERR_BYTE		0xFF /* Value returned for read
diff --git a/arch/powerpc/include/asm/fsl_ifc.h b/include/linux/fsl_ifc.h
similarity index 100%
rename from arch/powerpc/include/asm/fsl_ifc.h
rename to include/linux/fsl_ifc.h
-- 
1.7.9.5

^ permalink raw reply related

* [PATCH 1/3] pci: add "fundamental reset" quirk
From: Alexey Kardashevskiy @ 2014-01-17  5:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, Paul Mackerras, Thadeu Lima de Souza Cascardo
In-Reply-To: <1389935328-22588-1-git-send-email-aik@ozlabs.ru>

From: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
---
 drivers/pci/quirks.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index f6c31fa..f3eedbf 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -3271,6 +3271,19 @@ static int reset_chelsio_generic_dev(struct pci_dev *dev, int probe)
 	return 0;
 }
 
+static int reset_fundamental(struct pci_dev *dev, int probe)
+{
+	if (probe)
+		return 0;
+
+	pci_set_pcie_reset_state(dev, pcie_hot_reset);
+	msleep(250);
+	pci_set_pcie_reset_state(dev, pcie_deassert_reset);
+	msleep(1800);
+
+	return 0;
+}
+
 #define PCI_DEVICE_ID_INTEL_82599_SFP_VF   0x10ed
 #define PCI_DEVICE_ID_INTEL_IVB_M_VGA      0x0156
 #define PCI_DEVICE_ID_INTEL_IVB_M2_VGA     0x0166
@@ -3286,6 +3299,14 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
 		reset_intel_generic_dev },
 	{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
 		reset_chelsio_generic_dev },
+
+	{ PCI_VENDOR_ID_IBM, PCI_ANY_ID,
+		reset_fundamental },
+	{ PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID,
+		reset_fundamental },
+	{ PCI_VENDOR_ID_TI, PCI_ANY_ID,
+		reset_fundamental },
+
 	{ 0 }
 };
 
-- 
1.8.4.rc4

^ permalink raw reply related

* Re: [PATCH 1/3] pci: add "fundamental reset" quirk
From: Alexey Kardashevskiy @ 2014-01-17  5:09 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: aik, Paul Mackerras, Thadeu Lima de Souza Cascardo
In-Reply-To: <1389935328-22588-2-git-send-email-aik@ozlabs.ru>

Rats. Please ignore this patchset.


On 01/17/2014 04:08 PM, Alexey Kardashevskiy wrote:
> From: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
> 
> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
> ---
>  drivers/pci/quirks.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index f6c31fa..f3eedbf 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -3271,6 +3271,19 @@ static int reset_chelsio_generic_dev(struct pci_dev *dev, int probe)
>  	return 0;
>  }
>  
> +static int reset_fundamental(struct pci_dev *dev, int probe)
> +{
> +	if (probe)
> +		return 0;
> +
> +	pci_set_pcie_reset_state(dev, pcie_hot_reset);
> +	msleep(250);
> +	pci_set_pcie_reset_state(dev, pcie_deassert_reset);
> +	msleep(1800);
> +
> +	return 0;
> +}
> +
>  #define PCI_DEVICE_ID_INTEL_82599_SFP_VF   0x10ed
>  #define PCI_DEVICE_ID_INTEL_IVB_M_VGA      0x0156
>  #define PCI_DEVICE_ID_INTEL_IVB_M2_VGA     0x0166
> @@ -3286,6 +3299,14 @@ static const struct pci_dev_reset_methods pci_dev_reset_methods[] = {
>  		reset_intel_generic_dev },
>  	{ PCI_VENDOR_ID_CHELSIO, PCI_ANY_ID,
>  		reset_chelsio_generic_dev },
> +
> +	{ PCI_VENDOR_ID_IBM, PCI_ANY_ID,
> +		reset_fundamental },
> +	{ PCI_VENDOR_ID_MELLANOX, PCI_ANY_ID,
> +		reset_fundamental },
> +	{ PCI_VENDOR_ID_TI, PCI_ANY_ID,
> +		reset_fundamental },
> +
>  	{ 0 }
>  };
>  
> 


-- 
Alexey

^ permalink raw reply

* [PATCH 2/3] PPC: KVM: fix to compile without VFIO
From: Alexey Kardashevskiy @ 2014-01-17  5:08 UTC (permalink / raw)
  To: linuxppc-dev; +Cc: Alexey Kardashevskiy, Paul Mackerras
In-Reply-To: <1389935328-22588-1-git-send-email-aik@ozlabs.ru>

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
---
 arch/powerpc/kvm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/kvm/Kconfig b/arch/powerpc/kvm/Kconfig
index cb6e84a..88cffb8 100644
--- a/arch/powerpc/kvm/Kconfig
+++ b/arch/powerpc/kvm/Kconfig
@@ -65,7 +65,7 @@ config KVM_BOOK3S_64
 	select KVM
 	select SPAPR_TCE_IOMMU if IOMMU_SUPPORT
 	select KVM_BOOK3S_PR_POSSIBLE if !KVM_BOOK3S_HV_POSSIBLE
-	select KVM_VFIO
+	select KVM_VFIO if VFIO
 	---help---
 	  Support running unmodified book3s_64 and book3s_32 guest kernels
 	  in virtual machines on book3s_64 host processors.
-- 
1.8.4.rc4

^ permalink raw reply related

* [PATCH V2] cpuidle/governors: Fix logic in selection of idle states
From: Preeti U Murthy @ 2014-01-17  4:33 UTC (permalink / raw)
  To: svaidy, linux-pm, benh, daniel.lezcano, rjw, linux-kernel,
	srivatsa.bhat, paulmck, linuxppc-dev, tuukka.tikkanen

The cpuidle governors today are not handling scenarios where no idle state
can be chosen. Such scenarios coud arise if the user has disabled all the
idle states at runtime or the latency requirement from the cpus is very strict.

The menu governor returns 0th index of the idle state table when no other
idle state is suitable. This is even when the idle state corresponding to this
index is disabled or the latency requirement is strict and the exit_latency
of the lowest idle state is also not acceptable. Hence this patch
fixes this logic in the menu governor by defaulting to an idle state index
of -1 unless any other state is suitable.

The ladder governor needs a few more fixes in addition to that required in the
menu governor. When the ladder governor decides to demote the idle state of a
CPU, it does not check if the lower idle states are enabled. Add this logic
in addition to the logic where it chooses an index of -1 if it can neither
promote or demote the idle state of a cpu nor can it choose the current idle
state.

The cpuidle_idle_call() will return back if the governor decides upon not
entering any idle state. However it cannot return an error code because all
archs have the logic today that if the call to cpuidle_idle_call() fails, it
means that the cpuidle driver failed to *function*; for instance due to
errors during registration. As a result they end up deciding upon a
default idle state on their own, which could very well be a deep idle state.
This is incorrect in cases where no idle state is suitable.

Besides for the scenario that this patch is addressing, the call actually
succeeds. Its just that no idle state is thought to be suitable by the governors.
Under such a circumstance return success code without entering any idle
state.

Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>

Changes from V1:https://lkml.org/lkml/2014/1/14/26

1. Change the return code to success from -EINVAL due to the reason mentioned
in the changelog.
2. Add logic that the patch is addressing in the ladder governor as well.
3. Added relevant comments and removed redundant logic as suggested in the
above thread.
---

 drivers/cpuidle/cpuidle.c          |   15 +++++-
 drivers/cpuidle/governors/ladder.c |   98 ++++++++++++++++++++++++++----------
 drivers/cpuidle/governors/menu.c   |    7 +--
 3 files changed, 89 insertions(+), 31 deletions(-)

diff --git a/drivers/cpuidle/cpuidle.c b/drivers/cpuidle/cpuidle.c
index a55e68f..831b664 100644
--- a/drivers/cpuidle/cpuidle.c
+++ b/drivers/cpuidle/cpuidle.c
@@ -131,8 +131,9 @@ int cpuidle_idle_call(void)
 
 	/* ask the governor for the next state */
 	next_state = cpuidle_curr_governor->select(drv, dev);
+
+	dev->last_residency = 0;
 	if (need_resched()) {
-		dev->last_residency = 0;
 		/* give the governor an opportunity to reflect on the outcome */
 		if (cpuidle_curr_governor->reflect)
 			cpuidle_curr_governor->reflect(dev, next_state);
@@ -140,6 +141,18 @@ int cpuidle_idle_call(void)
 		return 0;
 	}
 
+	/* Unlike in the need_resched() case, we return here because the
+	 * governor did not find a suitable idle state. However idle is still
+	 * in progress as we are not asked to reschedule. Hence we return
+	 * without enabling interrupts.
+	 *
+	 * NOTE: The return code should still be success, since the verdict of this
+	 * call is "do not enter any idle state" and not a failed call due to
+	 * errors.
+	 */
+	if (next_state < 0)
+		return 0;
+
 	trace_cpu_idle_rcuidle(next_state, dev->cpu);
 
 	broadcast = !!(drv->states[next_state].flags & CPUIDLE_FLAG_TIMER_STOP);
diff --git a/drivers/cpuidle/governors/ladder.c b/drivers/cpuidle/governors/ladder.c
index 9f08e8c..f495f57 100644
--- a/drivers/cpuidle/governors/ladder.c
+++ b/drivers/cpuidle/governors/ladder.c
@@ -58,6 +58,36 @@ static inline void ladder_do_selection(struct ladder_device *ldev,
 	ldev->last_state_idx = new_idx;
 }
 
+static int can_promote(struct ladder_device *ldev, int last_idx,
+				int last_residency)
+{
+	struct ladder_device_state *last_state;
+
+	last_state = &ldev->states[last_idx];
+	if (last_residency > last_state->threshold.promotion_time) {
+		last_state->stats.promotion_count++;
+		last_state->stats.demotion_count = 0;
+		if (last_state->stats.promotion_count >= last_state->threshold.promotion_count)
+			return 1;
+	}
+	return 0;
+}
+
+static int can_demote(struct ladder_device *ldev, int last_idx,
+			int last_residency)
+{
+	struct ladder_device_state *last_state;
+
+	last_state = &ldev->states[last_idx];
+	if (last_residency < last_state->threshold.demotion_time) {
+		last_state->stats.demotion_count++;
+		last_state->stats.promotion_count = 0;
+		if (last_state->stats.demotion_count >= last_state->threshold.demotion_count)
+			return 1;
+	}
+	return 0;
+}
+
 /**
  * ladder_select_state - selects the next state to enter
  * @drv: cpuidle driver
@@ -73,29 +103,33 @@ static int ladder_select_state(struct cpuidle_driver *drv,
 
 	/* Special case when user has set very strict latency requirement */
 	if (unlikely(latency_req == 0)) {
-		ladder_do_selection(ldev, last_idx, 0);
-		return 0;
+		if (last_idx >= 0)
+			ladder_do_selection(ldev, last_idx, -1);
+		goto out;
 	}
 
-	last_state = &ldev->states[last_idx];
-
-	if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
-		last_residency = cpuidle_get_last_residency(dev) - \
-					 drv->states[last_idx].exit_latency;
+	if (last_idx >= 0) {
+		if (drv->states[last_idx].flags & CPUIDLE_FLAG_TIME_VALID) {
+			last_residency = cpuidle_get_last_residency(dev) - \
+						 drv->states[last_idx].exit_latency;
+		} else {
+			last_state = &ldev->states[last_idx];
+			last_residency = last_state->threshold.promotion_time + 1;
+		}
 	}
-	else
-		last_residency = last_state->threshold.promotion_time + 1;
 
 	/* consider promotion */
 	if (last_idx < drv->state_count - 1 &&
 	    !drv->states[last_idx + 1].disabled &&
 	    !dev->states_usage[last_idx + 1].disable &&
-	    last_residency > last_state->threshold.promotion_time &&
 	    drv->states[last_idx + 1].exit_latency <= latency_req) {
-		last_state->stats.promotion_count++;
-		last_state->stats.demotion_count = 0;
-		if (last_state->stats.promotion_count >= last_state->threshold.promotion_count) {
-			ladder_do_selection(ldev, last_idx, last_idx + 1);
+		if (last_idx >= 0) {
+			if (can_promote(ldev, last_idx, last_residency)) {
+				ladder_do_selection(ldev, last_idx, last_idx + 1);
+				return last_idx + 1;
+			}
+		} else {
+			ldev->last_state_idx = last_idx + 1;
 			return last_idx + 1;
 		}
 	}
@@ -107,26 +141,36 @@ static int ladder_select_state(struct cpuidle_driver *drv,
 	    drv->states[last_idx].exit_latency > latency_req)) {
 		int i;
 
-		for (i = last_idx - 1; i > CPUIDLE_DRIVER_STATE_START; i--) {
-			if (drv->states[i].exit_latency <= latency_req)
+		for (i = last_idx - 1; i >= CPUIDLE_DRIVER_STATE_START; i--) {
+			if (drv->states[i].exit_latency <= latency_req &&
+				!(drv->states[i].disabled || dev->states_usage[i].disable))
 				break;
 		}
-		ladder_do_selection(ldev, last_idx, i);
-		return i;
+		if (i >= 0) {
+			ladder_do_selection(ldev, last_idx, i);
+			return i;
+		}
+		goto out;
 	}
 
-	if (last_idx > CPUIDLE_DRIVER_STATE_START &&
-	    last_residency < last_state->threshold.demotion_time) {
-		last_state->stats.demotion_count++;
-		last_state->stats.promotion_count = 0;
-		if (last_state->stats.demotion_count >= last_state->threshold.demotion_count) {
-			ladder_do_selection(ldev, last_idx, last_idx - 1);
-			return last_idx - 1;
+	if (last_idx > CPUIDLE_DRIVER_STATE_START) {
+		int i = last_idx - 1;
+
+		if (can_demote(ldev, last_idx, last_residency) &&
+			!(drv->states[i].disabled || dev->states_usage[i].disable)) {
+			ladder_do_selection(ldev, last_idx, i);
+			return i;
 		}
+		/* We come here when the last_idx is still a suitable idle state, just that
+		 * promotion or demotion is not ideal.
+		 */
+		ldev->last_state_idx = last_idx;
+		return last_idx;
 	}
 
-	/* otherwise remain at the current state */
-	return last_idx;
+	/* we come here if no idle state is suitable */
+out:	ldev->last_state_idx = -1;
+	return ldev->last_state_idx;
 }
 
 /**
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
index cf7f2f0..e9f17ce 100644
--- a/drivers/cpuidle/governors/menu.c
+++ b/drivers/cpuidle/governors/menu.c
@@ -297,12 +297,12 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 		data->needs_update = 0;
 	}
 
-	data->last_state_idx = 0;
+	data->last_state_idx = -1;
 	data->exit_us = 0;
 
 	/* Special case when user has set very strict latency requirement */
 	if (unlikely(latency_req == 0))
-		return 0;
+		return data->last_state_idx;
 
 	/* determine the expected residency time, round up */
 	t = ktime_to_timespec(tick_nohz_get_sleep_length());
@@ -368,7 +368,8 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 /**
  * menu_reflect - records that data structures need update
  * @dev: the CPU
- * @index: the index of actual entered state
+ * @index: the index of actual entered state or -1 if no idle state is
+ * suitable.
  *
  * NOTE: it's important to be fast here because this operation will add to
  *       the overall exit latency.

^ permalink raw reply related

* [PATCH] powerpc: set the correct ksp_limit on ppc32 when switching to irq stack
From: Kevin Hao @ 2014-01-17  4:25 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Guenter Roeck; +Cc: linuxppc

Guenter Roeck has got the following call trace on a p2020 board:
  Kernel stack overflow in process eb3e5a00, r1=eb79df90
  CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
  task: eb3e5a00 ti: c0616000 task.ti: ef440000
  NIP: c003a420 LR: c003a410 CTR: c0017518
  REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
  MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
  GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
  GPR08: 00000000 020b8000 00000000 00000000 44008442
  NIP [c003a420] __do_softirq+0x94/0x1ec
  LR [c003a410] __do_softirq+0x84/0x1ec
  Call Trace:
  [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
  [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
  [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
  [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
  [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
  --- Exception: 501 at 0xfcda524
      LR = 0x10024900
  Instruction dump:
  7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
  5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
  Kernel panic - not syncing: kernel stack overflow
  CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
  Call Trace:

The reason is that we have used the wrong register to calculate the
ksp_limit in commit cbc9565ee826 (powerpc: Remove ksp_limit on ppc64).
Just fix it.

As suggested by Benjamin Herrenschmidt, also add the C prototype of the
function in the comment in order to avoid such kind of errors in the
future.

Cc: stable@vger.kernel.org # 3.12
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
---
 arch/powerpc/kernel/misc_32.S | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index e47d268727a4..7025c30a139a 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -57,11 +57,14 @@ _GLOBAL(call_do_softirq)
 	mtlr	r0
 	blr
 
+/*
+ * void call_do_irq(struct pt_regs *regs, struct thread_info *irqtp);
+ */
 _GLOBAL(call_do_irq)
 	mflr	r0
 	stw	r0,4(r1)
 	lwz	r10,THREAD+KSP_LIMIT(r2)
-	addi	r11,r3,THREAD_INFO_GAP
+	addi	r11,r4,THREAD_INFO_GAP
 	stwu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r4)
 	mr	r1,r4
 	stw	r10,8(r1)
-- 
1.8.3.1

^ permalink raw reply related

* Re: Kernel stack overflows due to  "powerpc: Remove ksp_limit on ppc64" with v3.13-rc8 on ppc32 (P2020)
From: Kevin Hao @ 2014-01-17  3:23 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linuxppc-dev, linux-kernel, Guenter Roeck
In-Reply-To: <1389927490.7406.10.camel@pasglop>

[-- Attachment #1: Type: text/plain, Size: 923 bytes --]

On Fri, Jan 17, 2014 at 01:58:10PM +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2014-01-17 at 10:20 +0800, Kevin Hao wrote:
> > On Thu, Jan 16, 2014 at 10:05:32AM -0800, Guenter Roeck wrote:
> > > Hi all,
> > > 
> > > I am getting kernel stack overflows with v3.13-rc8 on a system with P2020 CPU.
> > > The kernel is patched for the target, but I don't think that is related.
> > > Stack overflows are in different areas, but always in calls from __do_softirq.
> > > 
> > > Crashes happen reliably either during boot or if I put any kind of load
> > > onto the system.
> > 
> > How about the following fix:
> 
> Wow. I've been staring at that code for 15mn this morning and didn't
> spot it ! Nice catch :-)
> 
> Any chance you can send a version of that patch that adds the C
> prototype of the function in a comment right before the assembly ?

Will do. The patch is coming soon.

Thanks,
Kevin

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply

* Re: Kernel stack overflows due to  "powerpc: Remove ksp_limit on ppc64" with v3.13-rc8 on ppc32 (P2020)
From: Guenter Roeck @ 2014-01-17  3:15 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, Kevin Hao; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <1389927490.7406.10.camel@pasglop>

On 01/16/2014 06:58 PM, Benjamin Herrenschmidt wrote:
> On Fri, 2014-01-17 at 10:20 +0800, Kevin Hao wrote:
>> On Thu, Jan 16, 2014 at 10:05:32AM -0800, Guenter Roeck wrote:
>>> Hi all,
>>>
>>> I am getting kernel stack overflows with v3.13-rc8 on a system with P2020 CPU.
>>> The kernel is patched for the target, but I don't think that is related.
>>> Stack overflows are in different areas, but always in calls from __do_softirq.
>>>
>>> Crashes happen reliably either during boot or if I put any kind of load
>>> onto the system.
>>
>> How about the following fix:
>
> Wow. I've been staring at that code for 15mn this morning and didn't
> spot it ! Nice catch :-)
>
Yes, great catch! That fixes the problem.

Tested-by: Guenter Roeck <linux@roeck-us.net>

I assume you or Kevin will take it from there ?

Thanks,
Guenter

^ permalink raw reply

* Re: Kernel stack overflows due to  "powerpc: Remove ksp_limit on ppc64" with v3.13-rc8 on ppc32 (P2020)
From: Benjamin Herrenschmidt @ 2014-01-17  2:58 UTC (permalink / raw)
  To: Kevin Hao; +Cc: linuxppc-dev, linux-kernel, Guenter Roeck
In-Reply-To: <20140117022005.GA29880@pek-khao-d1.corp.ad.wrs.com>

On Fri, 2014-01-17 at 10:20 +0800, Kevin Hao wrote:
> On Thu, Jan 16, 2014 at 10:05:32AM -0800, Guenter Roeck wrote:
> > Hi all,
> > 
> > I am getting kernel stack overflows with v3.13-rc8 on a system with P2020 CPU.
> > The kernel is patched for the target, but I don't think that is related.
> > Stack overflows are in different areas, but always in calls from __do_softirq.
> > 
> > Crashes happen reliably either during boot or if I put any kind of load
> > onto the system.
> 
> How about the following fix:

Wow. I've been staring at that code for 15mn this morning and didn't
spot it ! Nice catch :-)

Any chance you can send a version of that patch that adds the C
prototype of the function in a comment right before the assembly ?

We should generalize that practice...

Cheers,
Ben.
 
> diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
> index e47d268727a4..52fffe5616b4 100644
> --- a/arch/powerpc/kernel/misc_32.S
> +++ b/arch/powerpc/kernel/misc_32.S
> @@ -61,7 +61,7 @@ _GLOBAL(call_do_irq)
>  	mflr	r0
>  	stw	r0,4(r1)
>  	lwz	r10,THREAD+KSP_LIMIT(r2)
> -	addi	r11,r3,THREAD_INFO_GAP
> +	addi	r11,r4,THREAD_INFO_GAP
>  	stwu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r4)
>  	mr	r1,r4
>  	stw	r10,8(r1)
> 
> Thanks,
> Kevin
> > 
> > Example:
> > 
> > Kernel stack overflow in process eb3e5a00, r1=eb79df90
> > CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
> > task: eb3e5a00 ti: c0616000 task.ti: ef440000
> > NIP: c003a420 LR: c003a410 CTR: c0017518
> > REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
> > MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
> > GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
> > GPR08: 00000000 020b8000 00000000 00000000 44008442
> > NIP [c003a420] __do_softirq+0x94/0x1ec
> > LR [c003a410] __do_softirq+0x84/0x1ec
> > Call Trace:
> > [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
> > [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
> > [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
> > [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
> > [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
> > --- Exception: 501 at 0xfcda524
> >     LR = 0x10024900
> > Instruction dump:
> > 7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
> > 5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
> > Kernel panic - not syncing: kernel stack overflow
> > CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
> > Call Trace:
> > Rebooting in 180 seconds..
> > 
> > Reverting the following commit fixes the problem.
> > 
> > cbc9565ee8 "powerpc: Remove ksp_limit on ppc64"
> > 
> > Should I submit a patch reverting this commit, or is there a better way to fix
> > the problem on short notice (given that 3.13 is close) ?
> > 
> > Thanks,
> > Guenter
> > _______________________________________________
> > Linuxppc-dev mailing list
> > Linuxppc-dev@lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-dev

^ permalink raw reply

* Re: Kernel stack overflows due to  "powerpc: Remove ksp_limit on ppc64" with v3.13-rc8 on ppc32 (P2020)
From: Kevin Hao @ 2014-01-17  2:20 UTC (permalink / raw)
  To: Guenter Roeck; +Cc: linuxppc-dev, linux-kernel
In-Reply-To: <20140116180532.GA1616@roeck-us.net>

[-- Attachment #1: Type: text/plain, Size: 2620 bytes --]

On Thu, Jan 16, 2014 at 10:05:32AM -0800, Guenter Roeck wrote:
> Hi all,
> 
> I am getting kernel stack overflows with v3.13-rc8 on a system with P2020 CPU.
> The kernel is patched for the target, but I don't think that is related.
> Stack overflows are in different areas, but always in calls from __do_softirq.
> 
> Crashes happen reliably either during boot or if I put any kind of load
> onto the system.

How about the following fix:

diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S
index e47d268727a4..52fffe5616b4 100644
--- a/arch/powerpc/kernel/misc_32.S
+++ b/arch/powerpc/kernel/misc_32.S
@@ -61,7 +61,7 @@ _GLOBAL(call_do_irq)
 	mflr	r0
 	stw	r0,4(r1)
 	lwz	r10,THREAD+KSP_LIMIT(r2)
-	addi	r11,r3,THREAD_INFO_GAP
+	addi	r11,r4,THREAD_INFO_GAP
 	stwu	r1,THREAD_SIZE-STACK_FRAME_OVERHEAD(r4)
 	mr	r1,r4
 	stw	r10,8(r1)

Thanks,
Kevin
> 
> Example:
> 
> Kernel stack overflow in process eb3e5a00, r1=eb79df90
> CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
> task: eb3e5a00 ti: c0616000 task.ti: ef440000
> NIP: c003a420 LR: c003a410 CTR: c0017518
> REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
> MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
> GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
> GPR08: 00000000 020b8000 00000000 00000000 44008442
> NIP [c003a420] __do_softirq+0x94/0x1ec
> LR [c003a410] __do_softirq+0x84/0x1ec
> Call Trace:
> [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
> [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
> [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
> [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
> [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
> --- Exception: 501 at 0xfcda524
>     LR = 0x10024900
> Instruction dump:
> 7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
> 5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
> Kernel panic - not syncing: kernel stack overflow
> CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
> Call Trace:
> Rebooting in 180 seconds..
> 
> Reverting the following commit fixes the problem.
> 
> cbc9565ee8 "powerpc: Remove ksp_limit on ppc64"
> 
> Should I submit a patch reverting this commit, or is there a better way to fix
> the problem on short notice (given that 3.13 is close) ?
> 
> Thanks,
> Guenter
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

[-- Attachment #2: Type: application/pgp-signature, Size: 490 bytes --]

^ permalink raw reply related

* [PATCH 8/8] powerpc/perf: add kconfig option for hypervisor provided counters
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer
In-Reply-To: <1389916434-2288-1-git-send-email-cody@linux.vnet.ibm.com>

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/Makefile             | 2 ++
 arch/powerpc/platforms/Kconfig.cputype | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 60d71ee..5e5fcd2 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS)	+= mpc7450-pmu.o
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
 obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
 
+obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o
+
 obj-$(CONFIG_PPC64)		+= $(obj64-y)
 obj-$(CONFIG_PPC32)		+= $(obj32-y)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index bca2465..f98ec61 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -363,6 +363,12 @@ config PPC_PERF_CTRS
        help
          This enables the powerpc-specific perf_event back-end.
 
+config HV_PERF_CTRS
+       def_bool y
+       depends on PERF_EVENTS && PPC_HAVE_PMU_SUPPORT
+       help
+         Enable access to perf counters provided by the hypervisor
+
 config SMP
 	depends on PPC_BOOK3S || PPC_BOOK3E || FSL_BOOKE || PPC_47x
 	bool "Symmetric multi-processing support"
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 7/8] powerpc/perf: add support for the hv 24x7 interface
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer
In-Reply-To: <1389916434-2288-1-git-send-email-cody@linux.vnet.ibm.com>

This provides a basic interface between hv_24x7 and perf. Similar to
the one provided for gpci, it lacks transaction support and does not
list any events.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-24x7.c | 354 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 354 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-24x7.c

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
new file mode 100644
index 0000000..fb49e66
--- /dev/null
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -0,0 +1,354 @@
+/*
+ * Hypervisor supplied "24x7" performance counter support
+ *
+ * Author: Cody P Schafer <cody@linux.vnet.ibm.com>
+ * Copyright 2014 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "hv-24x7: " fmt
+
+#include <linux/perf_event.h>
+#include <linux/module.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/hv_24x7.h>
+#include <asm/hv_gpci.h>
+#include <asm/io.h>
+
+/* See arch/powerpc/include/asm/24x7.h for details on the hcall interface */
+
+/* TODO: Merging events:
+ * - Think of the hcall as an interface to a 4d array of counters:
+ *   - x = domains
+ *   - y = indexes in the domain (core, chip, vcpu, node, etc)
+ *   - z = offset into the counter space
+ *   - w = lpars (guest vms, "logical partitions")
+ * - A single request is: x,y,y_last,z,z_last,w,w_last
+ *   - this means we can retrieve a rectangle of counters in y,z for a single x.
+ *
+ * - Things to consider (ignoring w):
+ *   - input  cost_per_request = 16
+ *   - output cost_per_result(ys,zs)  = 8 + 8 * ys + ys * zs
+ *   - limited number of requests per hcall (must fit into 4K bytes)
+ *     - 4k = 16 [buffer header] - 16 [request size] * request_count
+ *     - 255 requests per hcall
+ *   - sometimes it will be more efficient to read extra data and discard
+ */
+
+PMU_RANGE_ATTR(domain, config, 0, 3); /* u3 0-6, one of HV_24X7_PERF_DOMAIN */
+PMU_RANGE_ATTR(starting_index, config, 16, 31); /* u16 */
+PMU_RANGE_ATTR(offset, config, 32, 63); /* u32, see "data_offset" */
+PMU_RANGE_ATTR(lpar, config1, 0, 15); /* u16 */
+
+PMU_RANGE_RESV(reserved1, config,   4, 15);
+PMU_RANGE_RESV(reserved2, config1, 16, 63);
+PMU_RANGE_RESV(reserved3, config2,  0, 63);
+
+static struct attribute *format_attr[] = {
+	&format_attr_domain.attr,
+	&format_attr_offset.attr,
+	&format_attr_starting_index.attr,
+	&format_attr_lpar.attr,
+	NULL,
+};
+
+static struct attribute_group format_group = {
+	.name = "format",
+	.attrs = format_attr,
+};
+
+static const struct attribute_group *attr_groups[] = {
+	&format_group,
+	NULL,
+};
+
+struct hv_perf_caps {
+	u16 version;
+	u16 other_allowed:1,
+	    ga:1,
+	    expanded:1,
+	    lab:1,
+	    unused:12;
+};
+
+static unsigned long hv_perf_caps_get(struct hv_perf_caps *caps)
+{
+	unsigned long r;
+	struct p {
+		struct hv_get_perf_counter_info_params params;
+		struct cv_system_performance_capabilities caps;
+	} __packed __aligned(sizeof(uint64_t));
+
+	struct p arg = {
+		.params = {
+			.counter_request = cpu_to_be32(
+					CIR_system_performance_capabilities),
+			.starting_index = cpu_to_be32(-1),
+			.counter_info_version_in = 0,
+		}
+	};
+
+	r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+			       virt_to_phys(&arg), sizeof(arg));
+	caps->version = arg.params.counter_info_version_out;
+	caps->other_allowed = arg.caps.perf_collect_privlidged;
+	caps->ga = (arg.caps.capability_mask & CV_CM_GA) >> CV_CM_GA;
+	caps->expanded = (arg.caps.capability_mask & CV_CM_EXPANDED)
+				>> CV_CM_EXPANDED;
+	caps->lab = (arg.caps.capability_mask & CV_CM_LAB) >> CV_CM_LAB;
+
+	return r;
+}
+
+static bool is_physical_domain(int domain)
+{
+	return  domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP ||
+		domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CORE;
+}
+
+static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix,
+					 u16 lpar, u64 *res)
+{
+	unsigned long ret;
+	struct reqb {
+		struct hv_24x7_request_buffer buf;
+		struct hv_24x7_request req;
+	} request_buffer = {
+		.buf = {
+			.interface_version = HV_24X7_IF_VERSION_CURRENT,
+			.num_requests = 1,
+		},
+		.req = {
+			.performance_domain = domain,
+			.data_size = cpu_to_be16(8),
+			.data_offset = cpu_to_be32(offset),
+			.starting_lpar_ix = cpu_to_be16(lpar),
+			.max_num_lpars = cpu_to_be16(1),
+			.starting_ix = cpu_to_be16(ix),
+			.max_ix = cpu_to_be16(1),
+		}
+	};
+
+	struct resb {
+		struct hv_24x7_data_result_buffer buf;
+		struct hv_24x7_result res;
+		struct hv_24x7_result_element elem;
+		__be64 result;
+	} result_buffer = {};
+
+	ret = plpar_hcall_norets(H_GET_24X7_DATA,
+			virt_to_phys(&request_buffer), sizeof(request_buffer),
+			virt_to_phys(&result_buffer),  sizeof(result_buffer));
+
+	if (ret) {
+		/*
+		 * this failure is unexpected since we check if the exact same
+		 * hcall works in event_init
+		 */
+		pr_err_ratelimited("hcall failed: %d %#x %#x %d => 0x%lx (%ld) detail=0x%x failing ix=%x\n",
+				domain, offset, ix, lpar,
+				ret, ret,
+				result_buffer.buf.detailed_rc,
+				result_buffer.buf.failing_request_ix);
+		return ret;
+	}
+
+	*res = be64_to_cpu(result_buffer.result);
+	return ret;
+}
+
+static int h_24x7_event_init(struct perf_event *event)
+{
+	struct hv_perf_caps caps;
+	unsigned domain;
+	u64 ct;
+
+	/* Not our event */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* Unused areas must be 0 */
+	if (event_get_reserved1(event) ||
+	    event_get_reserved2(event) ||
+	    event_get_reserved3(event)) {
+		pr_devel("reserved set when forbidden 0x%llx(0x%llx) 0x%llx(0x%llx) 0x%llx(0x%llx)\n",
+				event->attr.config,
+				event_get_reserved1(event),
+				event->attr.config1,
+				event_get_reserved2(event),
+				event->attr.config2,
+				event_get_reserved3(event));
+		return -EINVAL;
+	}
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest  ||
+	    is_sampling_event(event)) /* no sampling */
+		return -EINVAL;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	/* offset must be 8 byte aligned */
+	if (event_get_offset(event) % 8) {
+		pr_devel("bad alignment\n");
+		return -EINVAL;
+	}
+
+	/* Domains above 6 are invalid */
+	domain = event_get_domain(event);
+	if (domain > 6) {
+		pr_devel("invalid domain\n");
+		return -EINVAL;
+	}
+
+	if (hv_perf_caps_get(&caps)) {
+		pr_devel("could not get capabilities\n");
+		return -EIO;
+	}
+
+	/* PHYSICAL domains & other lpars require extra capabilities */
+	if (!caps.other_allowed && (is_physical_domain(domain) ||
+		(event_get_lpar(event) != event_get_lpar_max()))) {
+		pr_devel("hv permisions disallow\n");
+		return -EPERM;
+	}
+
+	/* see if the event complains */
+	if (single_24x7_request(event_get_domain(event),
+				event_get_offset(event),
+				event_get_starting_index(event),
+				event_get_lpar(event),
+				&ct)) {
+		pr_devel("test hcall failed\n");
+		return -EIO;
+	}
+
+	/*
+	 * Some of the events are per-cpu, some per-core, some per-chip, some
+	 * are global, and some access data from other virtual machines on the
+	 * same physical machine. We can't map the cpu value without a lot of
+	 * work. Instead, we pick an arbitrary cpu for all events on this pmu.
+	 */
+	event->cpu = 0;
+
+	perf_swevent_init_hrtimer(event);
+	return 0;
+}
+
+static u64 h_24x7_get_value(struct perf_event *event)
+{
+	unsigned long ret;
+	u64 ct;
+	ret = single_24x7_request(event_get_domain(event),
+				  event_get_offset(event),
+				  event_get_starting_index(event),
+				  event_get_lpar(event),
+				  &ct);
+	if (ret)
+		/* We checked this in event init, shouldn't fail here... */
+		return 0;
+
+	return ct;
+}
+
+static void h_24x7_event_update(struct perf_event *event)
+{
+	s64 prev;
+	u64 now;
+	now = h_24x7_get_value(event);
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void h_24x7_event_start(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_RELOAD)
+		local64_set(&event->hw.prev_count, h_24x7_get_value(event));
+	perf_swevent_start_hrtimer(event);
+}
+
+static void h_24x7_event_stop(struct perf_event *event, int flags)
+{
+	perf_swevent_cancel_hrtimer(event);
+	h_24x7_event_update(event);
+}
+
+static int h_24x7_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		h_24x7_event_start(event, flags);
+
+	return 0;
+}
+
+static void h_24x7_event_del(struct perf_event *event, int flags)
+{
+	h_24x7_event_stop(event, flags);
+}
+
+static void h_24x7_event_read(struct perf_event *event)
+{
+	h_24x7_event_update(event);
+}
+
+struct pmu h_24x7_pmu = {
+	.task_ctx_nr = perf_invalid_context,
+
+	.name = "hv_24x7",
+	.attr_groups = attr_groups,
+	.event_init  = h_24x7_event_init,
+	.add         = h_24x7_event_add,
+	.del         = h_24x7_event_del,
+	.start       = h_24x7_event_start,
+	.stop        = h_24x7_event_stop,
+	.read        = h_24x7_event_read,
+
+	.event_idx = perf_swevent_event_idx,
+};
+
+static int hv_24x7_init(void)
+{
+	int r;
+	unsigned long hret;
+	struct hv_perf_caps caps;
+
+	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+		pr_info("not an lpar, disabled\n");
+		return -ENODEV;
+	}
+
+	hret = hv_perf_caps_get(&caps);
+	if (hret) {
+		pr_info("could not obtain capabilities, error 0x%80lx, disabling\n",
+				hret);
+		return -ENODEV;
+	}
+
+	pr_info("gpci interface versions: hv:0x%x, kernel:0x%x\n",
+			caps.version, COUNTER_INFO_VERSION_CURRENT);
+
+	pr_info("gpci interface capabilities: other:%d ga:%d expanded:%d lab:%d\n",
+			caps.other_allowed, caps.ga,
+			caps.expanded,
+			caps.lab);
+
+	r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+module_init(hv_24x7_init);
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 6/8] powerpc/perf: add support for the hv gpci (get performance counter info) interface
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer
In-Reply-To: <1389916434-2288-1-git-send-email-cody@linux.vnet.ibm.com>

This provides a basic link between perf and hv_gpci. Notably, it does
not yet support transactions and does not list any events (they can
still be manually composed).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-gpci.c | 235 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 235 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-gpci.c

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
new file mode 100644
index 0000000..31d9d59
--- /dev/null
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -0,0 +1,235 @@
+/*
+ * Hypervisor supplied "gpci" ("get performance counter info") performance
+ * counter support
+ *
+ * Author: Cody P Schafer <cody@linux.vnet.ibm.com>
+ * Copyright 2014 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#define pr_fmt(fmt) "hv-gpci: " fmt
+
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/hv_gpci.h>
+#include <asm/io.h>
+
+/* See arch/powerpc/include/asm/hv_gpci.h for details on the hcall interface */
+
+PMU_RANGE_ATTR(request, config, 0, 31); /* u32 */
+PMU_RANGE_ATTR(starting_index, config, 32, 63); /* u32 */
+PMU_RANGE_ATTR(secondary_index, config1, 0, 15); /* u16 */
+PMU_RANGE_ATTR(counter_info_version, config1, 16, 23); /* u8 */
+PMU_RANGE_ATTR(length, config1, 24, 31); /* u8, bytes of data (1-8) */
+PMU_RANGE_ATTR(offset, config1, 32, 63); /* u32, byte offset */
+
+static struct attribute *format_attr[] = {
+	&format_attr_request.attr,
+	&format_attr_starting_index.attr,
+	&format_attr_secondary_index.attr,
+	&format_attr_counter_info_version.attr,
+
+	&format_attr_offset.attr,
+	&format_attr_length.attr,
+	NULL,
+};
+
+static struct attribute_group format_group = {
+	.name = "format",
+	.attrs = format_attr,
+};
+
+static const struct attribute_group *attr_groups[] = {
+	&format_group,
+	NULL,
+};
+
+static unsigned long single_gpci_request(u32 req, u32 starting_index,
+		u16 secondary_index, u8 version_in, u32 offset, u8 length,
+		u64 *value)
+{
+	unsigned long ret;
+	size_t i;
+	u64 count;
+
+	struct {
+		struct hv_get_perf_counter_info_params params;
+		union {
+			union h_gpci_cvs data;
+			uint8_t bytes[sizeof(union h_gpci_cvs)];
+		};
+	} arg = {
+		.params = {
+			.counter_request = cpu_to_be32(req),
+			.starting_index = cpu_to_be32(starting_index),
+			.secondary_index = cpu_to_be16(secondary_index),
+			.counter_info_version_in = version_in,
+		}
+	};
+
+	ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+			virt_to_phys(&arg), sizeof(arg));
+	if (ret) {
+		pr_devel("hcall failed: 0x%lx\n", ret);
+		return ret;
+	}
+
+	/*
+	 * we verify offset and length are within the zeroed buffer at event
+	 * init.
+	 */
+	count = 0;
+	for (i = offset; i < offset + length; i++)
+		count |= arg.bytes[i] << (i - offset);
+
+	*value = count;
+	return ret;
+}
+
+static u64 h_gpci_get_value(struct perf_event *event)
+{
+	u64 count;
+	unsigned long ret = single_gpci_request(event_get_request(event),
+					event_get_starting_index(event),
+					event_get_secondary_index(event),
+					event_get_counter_info_version(event),
+					event_get_offset(event),
+					event_get_length(event),
+					&count);
+	if (ret)
+		return 0;
+	return count;
+}
+
+static void h_gpci_event_update(struct perf_event *event)
+{
+	s64 prev;
+	u64 now = h_gpci_get_value(event);
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void h_gpci_event_start(struct perf_event *event, int flags)
+{
+	local64_set(&event->hw.prev_count, h_gpci_get_value(event));
+	perf_swevent_start_hrtimer(event);
+}
+
+static void h_gpci_event_stop(struct perf_event *event, int flags)
+{
+	perf_swevent_cancel_hrtimer(event);
+	h_gpci_event_update(event);
+}
+
+static int h_gpci_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		h_gpci_event_start(event, flags);
+
+	return 0;
+}
+
+static void h_gpci_event_del(struct perf_event *event, int flags)
+{
+	h_gpci_event_stop(event, flags);
+}
+
+static void h_gpci_event_read(struct perf_event *event)
+{
+	h_gpci_event_update(event);
+}
+
+static int h_gpci_event_init(struct perf_event *event)
+{
+	u64 count;
+	u8 length;
+
+	/* Not our event */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* config2 is unused */
+	if (event->attr.config2)
+		return -EINVAL;
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest  ||
+	    is_sampling_event(event)) /* no sampling */
+		return -EINVAL;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	length = event_get_length(event);
+	if (length < 1 || length > 8)
+		return -EINVAL;
+
+	/* last byte within the buffer? */
+	if ((event_get_offset(event) + length) > sizeof(union h_gpci_cvs))
+		return -EINVAL;
+
+	/* check if the request works... */
+	if (single_gpci_request(event_get_request(event),
+				event_get_starting_index(event),
+				event_get_secondary_index(event),
+				event_get_counter_info_version(event),
+				event_get_offset(event),
+				length,
+				&count))
+		return -EINVAL;
+
+	/*
+	 * Some of the events are per-cpu, some per-core, some per-chip, some
+	 * are global, and some access data from other virtual machines on the
+	 * same physical machine. We can't map the cpu value without a lot of
+	 * work. Instead, we pick an arbitrary cpu for all events on this pmu.
+	 */
+	event->cpu = 0;
+
+	perf_swevent_init_hrtimer(event);
+	return 0;
+}
+
+struct pmu h_gpci_pmu = {
+	.task_ctx_nr = perf_invalid_context,
+
+	.name = "hv_gpci",
+	.attr_groups = attr_groups,
+	.event_init = h_gpci_event_init,
+	.add = h_gpci_event_add,
+	.del = h_gpci_event_del,
+	.start = h_gpci_event_start,
+	.stop = h_gpci_event_stop,
+	.read = h_gpci_event_read,
+
+	.event_idx = perf_swevent_event_idx,
+};
+
+static int hv_gpci_init(void)
+{
+	int r;
+
+	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+		pr_info("Not running under phyp, not supported\n");
+		return -ENODEV;
+	}
+
+	r = perf_pmu_register(&h_gpci_pmu, h_gpci_pmu.name, -1);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+module_init(hv_gpci_init);
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 5/8] powerpc: add 24x7 interface header
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer
In-Reply-To: <1389916434-2288-1-git-send-email-cody@linux.vnet.ibm.com>

24x7 (also called hv_24x7 or H_24X7) is an interface to obtain
performance counters from the hypervisor. These counters do not have a
fixed format/possition and are instead documented in a "24x7 Catalog",
which is provided by the hypervisor (that interface is also documented
in this header).

This method of obtaining performance counters from the hypervisor is
intended to paritialy replace the gpci interface.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hv_24x7.h | 239 +++++++++++++++++++++++++++++++++++++
 1 file changed, 239 insertions(+)
 create mode 100644 arch/powerpc/include/asm/hv_24x7.h

diff --git a/arch/powerpc/include/asm/hv_24x7.h b/arch/powerpc/include/asm/hv_24x7.h
new file mode 100644
index 0000000..f77b3cc
--- /dev/null
+++ b/arch/powerpc/include/asm/hv_24x7.h
@@ -0,0 +1,239 @@
+#ifndef ARCH_POWERPC_24X7_H_
+#define ARCH_POWERPC_24X7_H_
+
+#include <linux/types.h>
+
+struct hv_24x7_request {
+	/* PHYSICAL domains require enabling via phyp/hmc. */
+#define HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP 0x01
+#define HV_24X7_PERF_DOMAIN_PHYSICAL_CORE 0x02
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CORE   0x03
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CHIP   0x04
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_NODE   0x05
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_REMOTE_NODE 0x06
+	__u8 performance_domain;
+	__u8 reserved[0x1];
+
+	/* bytes to read starting at @data_offset. must be a multiple of 8 */
+	__be16 data_size;
+
+	/*
+	 * byte offset within the perf domain to read from. must be 8 byte
+	 * aligned
+	 */
+	__be32 data_offset;
+
+	/*
+	 * only valid for VIRTUAL_PROCESSOR domains, ignored for others.
+	 * -1 means "current partition only"
+	 *  Enabling via phyp/hmc required for non-"-1" values. 0 forbidden
+	 *  unless requestor is 0.
+	 */
+	__be16 starting_lpar_ix;
+
+	/*
+	 * Ignored when @starting_lpar_ix == -1
+	 * Ignored when @performance_domain is not VIRTUAL_PROCESSOR_*
+	 * -1 means "infinite" or all
+	 */
+	__be16 max_num_lpars;
+
+	/* chip, core, or virtual processor based on @performance_domain */
+	__be16 starting_ix;
+	__be16 max_ix;
+} __packed;
+
+struct hv_24x7_request_buffer {
+	/* 0 - ? */
+	/* 1 - ? */
+#define HV_24X7_IF_VERSION_CURRENT 0x01
+	__u8 interface_version;
+	__u8 num_requests;
+	__u8 reserved[0xE];
+	struct hv_24x7_request requests[];
+} __packed;
+
+struct hv_24x7_result_element {
+	__be16 lpar_ix;
+
+	/*
+	 * represents the core, chip, or virtual processor based on the
+	 * request's @performance_domain
+	 */
+	__be16 domain_ix;
+
+	/* -1 if @performance_domain does not refer to a virtual processor */
+	__be32 lpar_cfg_instance_id;
+
+	/* size = @result_element_data_size of cointaining result. */
+	__u8 element_data[];
+} __packed;
+
+struct hv_24x7_result {
+	__u8 result_ix;
+
+	/*
+	 * 0 = not all result elements fit into the buffer, additional requests
+	 *     required
+	 * 1 = all result elements were returned
+	 */
+	__u8 results_complete;
+	__be16 num_elements_returned;
+
+	/* This is a copy of @data_size from the coresponding hv_24x7_request */
+	__be16 result_element_data_size;
+	__u8 reserved[0x2];
+
+	/* WARNING: only valid for first result element due to variable sizes
+	 *          of result elements */
+	/* struct hv_24x7_result_element[@num_elements_returned] */
+	struct hv_24x7_result_element elements[];
+} __packed;
+
+struct hv_24x7_data_result_buffer {
+	/* See versioning for request buffer */
+	__u8 interface_version;
+
+	__u8 num_results;
+	__u8 reserved[0x1];
+	__u8 failing_request_ix;
+	__be32 detailed_rc;
+	__be64 cec_cfg_instance_id;
+	__be64 catalog_version_num;
+	__u8 reserved2[0x8];
+	/* WARNING: only valid for the first result due to variable sizes of
+	 *	    results */
+	struct hv_24x7_result results[]; /* [@num_results] */
+} __packed;
+
+/* From document "24x7 Event and Group Catalog Formats Proposal" v0.14 */
+struct hv_24x7_catalog_page_0 {
+#define HV_24X7_CATALOG_MAGIC 0x32347837 /* "24x7" in ASCII */
+	__be32 magic;
+	__be32 length; /* In 4096 byte pages */
+	__u8 reserved1[4];
+	__be32 version;
+	__u8 build_time_stamp[16]; /* "YYYYMMDDHHMMSS\0\0" */
+	__u8 reserved2[32];
+	__be16 schema_data_offs; /* in 4096 byte pages */
+	__be16 schema_data_len;  /* in 4096 byte pages */
+	__be16 schema_entry_count;
+	__u8 reserved3[2];
+	__be16 group_data_offs; /* in 4096 byte pages */
+	__be16 group_data_len;  /* in 4096 byte pages */
+	__be16 group_entry_count;
+	__u8 reserved4[2];
+	__be16 formula_data_offs; /* in 4096 byte pages */
+	__be16 formula_data_len;  /* in 4096 byte pages */
+	__be16 formula_entry_count;
+	__u8 reserved5[2];
+} __packed;
+
+struct hv_24x7_event_data {
+	__be16 length; /* in bytes, must be a multiple of 16 */
+	__u8 reserved1[2];
+	__u8 domain; /* Chip = 1, Core = 2 */
+	__u8 reserved2[1];
+	__be16 event_group_record_offs; /* in bytes, must be 8 byte aligned */
+	__be16 event_group_record_len; /* in bytes */
+
+	/* in bytes, offset from event_group_record */
+	__be16 event_counter_offs;
+
+	/* verified_state, unverified_state, caveat_state, broken_state, ... */
+	__be32 flags;
+
+	__be16 primary_group_ix;
+	__be16 group_count;
+	__be16 event_name_len;
+	__u8 remainder[];
+	/* __u8 event_name[event_name_len - 2]; */
+	/* __be16 event_description_len; */
+	/* __u8 event_desc[event_description_len - 2]; */
+	/* __be16 detailed_desc_len; */
+	/* __u8 detailed_desc[detailed_desc_len - 2]; */
+} __packed;
+
+struct hv_24x7_group_data {
+	__be16 length; /* in bytes, must be multiple of 16 */
+	__u8 reserved1[2];
+	__be32 flags; /* undefined contents */
+	__u8 domain; /* Chip = 1, Core = 2 */
+	__u8 reserved2[1];
+	__be16 event_group_record_offs;
+	__be16 event_group_record_len;
+	__u8 group_schema_ix;
+	__u8 event_count; /* 1 to 16 */
+	__be16 event_ixs;
+	__be16 group_name_len;
+	__u8 remainder[];
+	/* __u8 group_name[group_name_len]; */
+	/* __be16 group_desc_len; */
+	/* __u8 group_desc[group_desc_len]; */
+} __packed;
+
+/* TODO: Schema Data */
+/* TODO: Event Counter Group Record (see the PORE/SLW workbook) */
+
+/* "Get Event Counter Group Record Schema hypervisor interface" */
+
+enum hv_24x7_grs_field_enums {
+	/* GRS_COUNTER_1 = 1
+	 * GRS_COUNTER_2 = 2
+	 * ...
+	 * GRS_COUNTER_31 = 32 // FIXME: Doc issue.
+	 */
+	GRS_COUNTER_BASE = 1,
+	GRS_COUNTER_LAST = 32,
+	GRS_TIMEBASE_UPDATE = 48,
+	GRS_TIMEBASE_FENCE = 49,
+	GRS_UPDATE_COUNT = 50,
+	GRS_MEASUREMENT_PERIOD = 51,
+	GRS_ACCUMULATED_MEASUREMENT_PERIOD = 52,
+	GRS_LAST_UPDATE_PERIOD = 53,
+	GRS_STATUS_FLAGS = 54,
+};
+
+enum hv_24x7_grs_enums {
+	GRS_CORE_SCHEMA_INDEX = 0,
+};
+
+struct hv_24x7_grs_field {
+	__be16 field_enum;
+	__be16 offs; /* in bytes, within Event Counter group record */
+	__be16 length; /* in bytes */
+	__be16 flags; /* presently unused */
+} __packed;
+
+struct hv_24x7_grs {
+	__be16 length;
+	__u8 reserved1[2];
+	__be16 descriptor;
+	__be16 version_id;
+	__u8 reserved2[6];
+	__be16 field_entry_count;
+	__u8 field_entrys[];
+} __packed;
+
+struct hv_24x7_formula_data {
+	__be32 length; /* in bytes, must be multiple of 16 */
+	__u8 reserved1[2];
+	__be32 flags; /* not yet defined */
+	__be16 group;
+	__u8 reserved2[6];
+	__be16 name_len;
+	__u8 remainder[];
+	/* __u8 name[name_len]; */
+	/* __be16 desc_len; */
+	/* __u8 desc[name_len]; */
+	/* __be16 formula_len */
+	/* __u8 formula[formula_len]; */
+} __packed;
+
+/* Formula Syntax: ie, impliment a forth interpereter. */
+/* need fast lookup of the formula names, event names, "delta-timebase",
+ * "delta-cycles", "delta-instructions", "delta-seconds" */
+/* operators: '+', '-', '*', '/', 'mod', 'rem', 'sqr', 'x^y' (XXX: pow? xor?),
+ *            'rot', 'dup' */
+
+#endif
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 4/8] powerpc: add hv_gpci interface header
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer
In-Reply-To: <1389916434-2288-1-git-send-email-cody@linux.vnet.ibm.com>

"H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
here on) is an interface to retrieve specific performance counters and
other data from the hypervisor. All outputs have a fixed format (and
are represented as structs in this patch).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hv_gpci.h | 490 +++++++++++++++++++++++++++++++++++++
 1 file changed, 490 insertions(+)
 create mode 100644 arch/powerpc/include/asm/hv_gpci.h

diff --git a/arch/powerpc/include/asm/hv_gpci.h b/arch/powerpc/include/asm/hv_gpci.h
new file mode 100644
index 0000000..237de26
--- /dev/null
+++ b/arch/powerpc/include/asm/hv_gpci.h
@@ -0,0 +1,490 @@
+#ifndef LINUX_POWERPC_UAPI_HV_GPCI_H_
+#define LINUX_POWERPC_UAPI_HV_GPCI_H_
+
+#include <linux/types.h>
+
+/* From the document "H_GetPerformanceCounterInfo Interface" v1.06, paritialy
+ * updated with v1.07 */
+
+/* H_GET_PERF_COUNTER_INFO argument */
+struct hv_get_perf_counter_info_params {
+	__be32 counter_request; /* I */
+	__be32 starting_index;  /* IO */
+	__be16 secondary_index; /* IO */
+	__be16 returned_values; /* O */
+	__be32 detail_rc; /* O, "only for 32bit clients" */
+
+	/*
+	 * O, size each of counter_value element in bytes, only set for version
+	 * >= 0x3
+	 */
+	__be16 cv_element_size;
+
+	/* I, funny if version < 0x3 */
+	__u8 counter_info_version_in;
+
+	/* O, funny if version < 0x3 */
+	__u8 counter_info_version_out;
+	__u8 reserved[0xC];
+	__u8 counter_value[];
+} __packed;
+
+/* 8 => power8 (1.07)
+ * 6 => TLBIE  (1.07)
+ * 5 => (1.05)
+ * 4 => ?
+ * 3 => ?
+ * 2 => v7r7m0.phyp (?)
+ * 1 => v7r6m0.phyp (?)
+ * 0 => v7r{2,3,4}m0.phyp (?)
+ */
+#define COUNTER_INFO_VERSION_CURRENT 0x8
+
+/* these determine the counter_value[] layout and the meaning of starting_index
+ * and secondary_index */
+enum counter_info_requests {
+
+	/* GENERAL */
+
+	/* @starting_index: "starting" physical processor index or -1 for
+	 *                  current phyical processor. Data is only collected
+	 *                  for the processors' "primary" thread.
+	 * @secondary_index: unused
+	 */
+	CIR_dispatch_timebase_by_processor = 0x10,
+
+	/* @starting_index: starting partition id or -1 for the current logical
+	 *                  partition (virtual machine).
+	 * @secondary_index: unused
+	 */
+	CIR_entitled_capped_uncapped_donated_idle_timebase_by_partition = 0x20,
+
+	/* @starting_index: starting partition id or -1 for the current logical
+	 *                  partition (virtual machine).
+	 * @secondary_index: unused
+	 */
+	CIR_run_instructions_run_cycles_by_partition = 0x30,
+
+	/* @starting_index: must be -1 (to refer to the current partition)
+	 * @secondary_index: unused
+	 */
+	CIR_system_performance_capabilities = 0x40,
+
+
+	/* Data from this should only be considered valid if
+	 * counter_info_version >= 0x3
+	 * @starting_index: starting hardware chip id or -1 for the current hw
+	 *		    chip id
+	 * @secondary_index: unused
+	 */
+	CIR_processor_bus_utilization_abc_links = 0x50,
+
+	/* Data from this should only be considered valid if
+	 * counter_info_version >= 0x3
+	 * @starting_index: starting hardware chip id or -1 for the current hw
+	 *		    chip id
+	 * @secondary_index: unused
+	 */
+	CIR_processor_bus_utilization_wxyz_links = 0x60,
+
+
+	/* EXPANDED */
+
+	/* Avaliable if counter_info_version >= 0x3
+	 * @starting_index: starting hardware chip id or -1 for the current hw
+	 *		    chip id
+	 * @secondary_index: unused
+	 */
+	CIR_processor_bus_utilization_gx_links = 0x70,
+
+	/* Avaliable if counter_info_version >= 0x3
+	 * @starting_index: starting hardware chip id or -1 for the current hw
+	 *		    chip id
+	 * @secondary_index: unused
+	 */
+	CIR_processor_bus_utilization_mc_links = 0x80,
+
+	/* Avaliable if counter_info_version >= 0x3
+	 * @starting_index: starting physical processor or -1 for the current
+	 *                  physical processor
+	 * @secondary_index: unused
+	 */
+	CIR_processor_config = 0x90,
+
+	/* Avaliable if counter_info_version >= 0x3
+	 * @starting_index: starting physical processor or -1 for the current
+	 *                  physical processor
+	 * @secondary_index: unused
+	 */
+	CIR_current_processor_frequency = 0x91,
+
+	CIR_processor_core_utilization = 0x94,
+
+	CIR_processor_core_power_mode = 0x95,
+
+	CIR_affinity_domain_information_by_virutal_processor = 0xA0,
+
+	CIR_affinity_domain_info_by_domain = 0xB0,
+
+	CIR_affinity_domain_info_by_partition = 0xB1,
+
+	/* @starting_index: unused
+	 * @secondary_index: unused
+	 */
+	CIR_physical_memory_info = 0xC0,
+
+	CIR_processor_bus_topology = 0xD0,
+
+	CIR_partition_hypervisor_queuing_times = 0xE0,
+
+	CIR_system_hypervisor_times = 0xF0,
+
+	/* LAB */
+
+	CIR_set_mmcrh = 0x80001000,
+	CIR_get_hpmcx = 0x80002000,
+};
+
+/* counter value layout */
+struct cv_dispatch_timebase_by_processor {
+	__be64 processor_time_in_timebase_cycles;
+	__be32 hw_processor_id;
+	__be16 owning_part_id; /* 0xffff if shared or unowned */
+	__u8 processor_state;
+	__u8 version; /* unused unless counter_info_version == 0 */
+	__be32 hw_chip_id; /* -1 for "Not Installed" processors */
+	__be32 phys_module_id; /* -1 for "Not Installed" processors */
+	__be32 primary_affinity_domain_idx;
+	__be32 secondary_affinity_domain_idx;
+	__be32 processor_version;
+	__be16 logical_processor_idx;
+	__u8 reserved[0x2];
+
+	/* counter_info_version >= 0x3 || version >= 0x1 */
+	__be32 processor_id_register;
+	__be32 physical_processor_idx; /* counter_info_version >= 0x3 */
+} __packed;
+
+struct cv_timebase_by_partition {
+	__be64 partition_id;
+	__be64 entitled_cycles;
+	__be64 consumed_capped_cycles;
+	__be64 consumed_uncapped_cycles;
+	__be64 cycles_donated;
+	__be64 purr_idle_cycles;
+} __packed;
+
+struct cv_cycles_per_partition {
+	__be64 partition_id;
+	__be64 instructions_completed; /* 0 if collection is unsupported */
+	__be64 cycles; /* 0 if collection is unsupported */
+} __packed;
+
+struct cv_system_performance_capabilities {
+	/* If != 0, allowed to collect data from other partitions */
+	__u8 perf_collect_privlidged;
+
+	/* These are only valid if counter_info_version >= 0x3 */
+#define CV_CM_GA       0x1
+#define CV_CM_EXPANDED 0x2
+#define CV_CM_LAB      0x3
+	/* remainig bits are reserved */
+	__u8 capability_mask;
+	__u8 reserved[0xE];
+} __packed;
+
+struct cv_processor_bus_utilization_abc {
+	__be32 hw_chip_id;
+	__u8 reserved1[0xC];
+	__be64 total_link_cycles;
+	__be64 idle_cycles_a;
+	__be64 idle_cycles_b;
+	__be64 idle_cycles_c;
+	__u8 reserved2[0x20];
+} __packed;
+
+struct cv_processor_bus_utilization_wxyz {
+	__be32 hw_chip_id;
+	__u8 reserved1[0xC];
+	__be64 total_link_cycles;
+
+	/* Inactive links (all cycles idle) give -1 */
+	__be64 idle_cycles_w;
+	__be64 idle_cycles_x;
+	__be64 idle_cycles_y;
+	__be64 idle_cycles_z;
+	__u8 reserved2[0x28];
+} __packed;
+
+/* EXPANDED */
+
+struct cv_gx_cycles {
+	__be64 address_cycles;
+	__be64 data_cycles;
+	__be64 retries;
+	__be64 bus_cycles;
+	__be64 total_cycles;
+} __packed;
+
+struct cv_gx_cycles_io {
+	struct cv_gx_cycles in, out;
+} __packed;
+
+struct cv_processor_bus_utilization_gx {
+	__be32 hw_chip_id;
+	__u8 reserved1[0xC];
+	struct cv_gx_cycles_io gx[2];
+} __packed;
+
+struct cv_mc_counts {
+	__be64 frames;
+	__be64 reads;
+	__be64 writes;
+	__be64 total_cycles;
+} __packed;
+
+/* inactive links return 0 for all utilization data */
+struct cv_processor_bus_utilization_mc {
+	__be32 hw_chip_id;
+	__u8 reserved1[0xC];
+	struct cv_mc_counts mc[2];
+} __packed;
+
+struct cv_processor_config {
+	__be32 physical_processor_idx;
+	__be32 hw_node_id;
+	__be32 hw_card_id;
+	__be32 phys_module_id;
+	__be32 hw_chip_id;
+	__be32 hw_processor_id;
+	__be32 processor_id_register;
+
+#define CV_PS_NOT_INSTALLED 0x1
+#define CV_PS_GAURDED_OFF   0x2
+#define CV_PS_UNLICENCED    0x3
+#define CV_PS_SHARED        0x4
+#define CV_PS_BORROWED      0x5
+#define CV_PS_DEDICATED     0x6
+	__u8 processor_state;
+
+	__u8 reserved1[0x1];
+	__be16 owning_part_id;
+	__be32 processor_version;
+	__u8 reserved2[0x4];
+} __packed;
+
+struct cv_processor_frequency {
+	__be32 physical_processor_idx;
+	__be32 hw_processor_id;
+	__u8 reserved1[0x8];
+	__be32 nominal_freq_mhz;
+	__be32 current_freq_mhz;
+} __packed;
+
+struct cv_processor_core_utilization {
+	__be32 physical_processor_idx;
+	__be32 hw_processor_id;
+	__be64 cycles;
+	__be64 timebase_at_collection;
+	__be64 purr_cycles;
+	__be64 sum_of_cycles_across_threads;
+	__be64 instructions_completed;
+} __packed;
+
+struct cv_processor_core_power_mode {
+	__be16 partition_id;
+	__u8 reserved1[0x6];
+
+#define CV_PM_NONE		 0x0
+#define CV_PM_NOMINAL		 0x1
+#define CV_PM_DYNAMIC_MAX_PERF   0x2
+#define CV_PM_DYNAMIC_POWER_SAVE 0x3
+#define CV_PM_UNKNOWN		 0xF
+	__be16 power_mode;
+
+	__u8 reserved2[0x6];
+} __packed;
+
+struct cv_affinity_domain_information_by_virutal_processor {
+	__be16 partition_id;
+	__be16 virtual_processor_idx;
+	__u8 reserved1[0xC];
+	__be16 physical_processor_idx;
+	__be16 primary_affinity_domain_idx;
+	__be16 secondary_affinity_domain_idx;
+	__u8 reserved2[0x2];
+	__u8 reserved3[0x8];
+} __packed;
+
+struct cv_affinity_domain_info_by_domain {
+	__be16 primary_affinity_domain_idx;
+	__be16 secondary_affinity_domain_idx;
+	__be32 total_processor_units;
+	__be32 free_dedicated_processor_units;
+	__be32 free_shared_processor_units;
+	__be32 total_memory_lmbs;
+	__be32 free_memory_lmbs;
+	__be32 num_partitions_in_domain;
+	__u8 reserved1[0x14];
+} __packed;
+
+struct cv_affinity_domain_info_by_partition {
+	__be16 partition_id;
+	__u8 reserved1[0x6];
+	__be16 assignment_order;
+
+#define CV_PPS_UNKNOWN			      0x00
+#define CV_PPS_CONTAIN_IN_PRIMARY_DOMAIN      0x01
+#define CV_PPS_CONTAIN_IN_SECONDARY_DOMAIN    0x02
+#define CV_PPS_SPREAD_ACROSS_SECONDAY_DOMAINS 0x03
+#define CV_PPS_WHEREEVER		      0x04
+#define CV_PPS_SCRAMBLE			      0x05
+	__u8 partition_placement_spread;
+
+	__u8 parition_affinity_score;
+	__be16 num_affinity_domain_elements;
+	__be16 affinity_domain_element_size;
+	__u8 domain_elements[];
+} __packed;
+
+struct cv_affinity_domain_elem {
+	__be16 primary_affinity_domain_idx;
+	__be16 secondary_affinity_domain_idx;
+	__be32 dedicated_processor_units_allocated;
+	__be32 dedicated_memory_allocated_reserved_1;
+	__be32 dedicated_memory_allocated_reserved_2;
+	__be32 dedicated_memory_allocated_16Gb_pages;
+	__u8 reserved[0x8];
+} __packed;
+
+/* Also avaliable via `of_get_flat_dt_prop(node, "ibm,lmb-size", &l)` */
+struct cv_physical_memory_info {
+	__be64 lmb_size_in_bytes;
+	__u8 reserved1[0x18];
+} __packed;
+
+struct cv_processor_bus_topology {
+	__be32 hw_chip_id;
+	__be32 hw_node_id;
+	__be32 fabric_chip_id;
+	__u8 reserved1[0x4];
+
+#define CV_IM_A_LINK_ACTIVE (1 << 0)
+#define CV_IM_B_LINK_ACTIVE (1 << 1)
+#define CV_IM_C_LINK_ACTIVE (1 << 2)
+/* Bits 3-5 are reserved */
+#define CV_IM_ABC_LINK_WIDTH_MASK ((1 << 6) | (1 << 7))
+#define CV_IM_ABC_LINK_WIDTH_SHIFT 6
+#define CV_IM_ABC_LINK_WIDTH_8B 0x0
+#define CV_IM_ABC_LINK_WIDTH_4B 0x1
+
+#define CV_IM_W_LINK_ACTIVE (1 << 8)
+#define CV_IM_X_LINK_ACTIVE (1 << 9)
+#define CV_IM_Y_LINK_ACTIVE (1 << 10)
+#define CV_IM_Z_LINK_ACTIVE (1 << 11)
+/* Bits 12-13 are reserved */
+
+#define CV_IM_WXYZ_LINK_WIDTH_MASK ((1 << 14) | (1 << 15))
+#define CV_IM_WXYZ_LINK_WIDTH_SHIFT 14
+#define CV_IM_WXYZ_LINK_WIDTH_8B 0x0
+#define CV_IM_WXYZ_LINK_WIDTH_4B 0x1
+
+#define CV_IM_GX0_CONFIGURED (1 << 16)
+#define CV_IM_GX1_CONFIGURED (1 << 17)
+/* Bits 18-23 are reserved */
+#define CV_IM_MC0_CONFIGURED (1 << 24)
+#define CV_IM_MC1_CONFIGURED (1 << 25)
+/* Bits 26-31 are reserved */
+
+	__be32 info_mask;
+
+	__u8 hw_node_id_connected_to_a_link;
+	__u8 hw_node_id_connected_to_b_link;
+
+	__u8 reserved2[0x2];
+
+	__u8 fabric_chip_id_connected_to_w_link;
+	__u8 fabric_chip_id_connected_to_x_link;
+	__u8 fabric_chip_id_connected_to_y_link;
+	__u8 fabric_chip_id_connected_to_z_link;
+
+	__u8 reserved3[0x4];
+} __packed;
+
+struct cv_partition_hypervisor_queuing_times {
+	__be16 partition_id;
+	__u8 reserved1[0x6];
+	__be64 time_waiting_for_entitlement; /* in timebase cycles */
+	__be64 times_waited_for_entitlement;
+	__be64 time_waiting_for_physical_processor; /* in timebase cycles */
+	__be64 times_waited_for_physical_processor;
+	__be64 dispatches_on_home_processor_core;
+	__be64 dispatches_on_home_primary_affinity_domain;
+	__be64 dispatches_on_home_secondary_affinity_domain;
+	__be64 dispatches_off_home_secondary_affinity_domain;
+	__be64 dispatches_on_dedicated_processor_donating_cycles;
+} __packed;
+
+struct cv_system_hypervisor_times {
+	__be64 phyp_time_spent_to_dispatch_virtual_processors;
+	__be64 phyp_time_spent_processing_virtual_processor_timers;
+	__be64 phyp_time_spent_managing_partitions_over_entitlement;
+	__be64 time_spent_on_system_managment;
+} __packed;
+
+/* LAB */
+
+struct cv_set_mmcrh {
+	/* Only HPMC bits (40:46, 48:54) used, all others ignored
+	 * -1 = default (0x00000000_003C1200)
+	 */
+	__be64 mmcrh_value_to_set;
+};
+
+struct cv_get_hpmcx {
+	__be32 hw_processor_id;
+	__u8 reserved1[0x4];
+	__be64 mmcrh_current;
+	__be64 time_since_mmcrh_was_set;
+	__be64 hpmc1_since_current_mmcrh;
+	__be64 hpmc2_since_current_mmcrh;
+	__be64 hpmc3_since_current_mmcrh;
+	__be64 hpmc3_current;
+	__be64 hpmc4_since_current_mmcrh;
+	__be64 hpmc4_current;
+};
+
+union h_gpci_cvs {
+	/* GA */
+	struct cv_dispatch_timebase_by_processor dispatch_timebase_by_processor;
+	struct cv_timebase_by_partition timebase_by_partition;
+	struct cv_cycles_per_partition cycles_per_partition;
+	struct cv_system_performance_capabilities system_performance_capabilities;
+	struct cv_processor_bus_utilization_abc processor_bus_utilization_abc;
+	struct cv_processor_bus_utilization_wxyz processor_bus_utilization_wxyz;
+
+	/* EXPANDED */
+	struct cv_gx_cycles gx_cycles;
+	struct cv_gx_cycles_io gx_cycles_io;
+	struct cv_processor_bus_utilization_gx processor_bus_utilization_gx;
+	struct cv_mc_counts mc_counts;
+	struct cv_processor_bus_utilization_mc processor_bus_utilization_mc;
+	struct cv_processor_config processor_config;
+	struct cv_processor_frequency processor_frequency;
+	struct cv_processor_core_utilization processor_core_utilization;
+	struct cv_processor_core_power_mode processor_core_power_mode;
+	struct cv_affinity_domain_information_by_virutal_processor affinity_domain_information_by_virutal_processor;
+	struct cv_affinity_domain_info_by_domain affinity_domain_info_by_domain;
+	struct cv_affinity_domain_info_by_partition affinity_domain_info_by_partition;
+	struct cv_affinity_domain_elem affinity_domain_elem;
+	struct cv_physical_memory_info physical_memory_info;
+	struct cv_processor_bus_topology processor_bus_topology;
+	struct cv_partition_hypervisor_queuing_times partition_hypervisor_queuing_times;
+	struct cv_system_hypervisor_times system_hypervisor_times;
+
+	/* LAB */
+	struct cv_set_mmcrh set_mmcrh;
+	struct cv_get_hpmcx get_hpmcx;
+};
+
+#endif
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 2/8] perf core: export swevent hrtimer helpers
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer
In-Reply-To: <1389916434-2288-1-git-send-email-cody@linux.vnet.ibm.com>

Export the swevent hrtimer helpers currently only used in events/core.c
to allow the addition of architecture specific sw-like pmus.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 include/linux/perf_event.h | 5 ++++-
 kernel/events/core.c       | 8 ++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8646e33..c5bc71a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -558,7 +558,10 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
 				int src_cpu, int dst_cpu);
 extern u64 perf_event_read_value(struct perf_event *event,
 				 u64 *enabled, u64 *running);
-
+extern void perf_swevent_init_hrtimer(struct perf_event *event);
+extern void perf_swevent_start_hrtimer(struct perf_event *event);
+extern void perf_swevent_cancel_hrtimer(struct perf_event *event);
+extern int perf_swevent_event_idx(struct perf_event *event);
 
 struct perf_sample_data {
 	u64				type;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f574401..d881d1e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5801,7 +5801,7 @@ static int perf_swevent_init(struct perf_event *event)
 	return 0;
 }
 
-static int perf_swevent_event_idx(struct perf_event *event)
+int perf_swevent_event_idx(struct perf_event *event)
 {
 	return 0;
 }
@@ -6030,7 +6030,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 	return ret;
 }
 
-static void perf_swevent_start_hrtimer(struct perf_event *event)
+void perf_swevent_start_hrtimer(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	s64 period;
@@ -6052,7 +6052,7 @@ static void perf_swevent_start_hrtimer(struct perf_event *event)
 				HRTIMER_MODE_REL_PINNED, 0);
 }
 
-static void perf_swevent_cancel_hrtimer(struct perf_event *event)
+void perf_swevent_cancel_hrtimer(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
@@ -6064,7 +6064,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 	}
 }
 
-static void perf_swevent_init_hrtimer(struct perf_event *event)
+void perf_swevent_init_hrtimer(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 3/8] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer
In-Reply-To: <1389916434-2288-1-git-send-email-cody@linux.vnet.ibm.com>

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index d8b600b..48d6efa 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -269,11 +269,15 @@
 #define H_COP			0x304
 #define H_GET_MPP_X		0x314
 #define H_SET_MODE		0x31C
-#define MAX_HCALL_OPCODE	H_SET_MODE
+#define H_GET_24X7_CATALOG_PAGE 0xF078
+#define H_GET_24X7_DATA		0xF07C
+#define H_GET_PERF_COUNTER_INFO 0xF080
+#define MAX_HCALL_OPCODE	H_GET_PERF_COUNTER_INFO
 
 /* Platform specific hcalls, used by KVM */
 #define H_RTAS			0xf000
 
+
 #ifndef __ASSEMBLY__
 
 /**
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 1/8] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer
In-Reply-To: <1389916434-2288-1-git-send-email-cody@linux.vnet.ibm.com>

Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which
generate functions to extract the relevent bits from
event->attr.config{,1,2} for use by sw-like pmus where the
'config{,1,2}' values don't map directly to hardware registers.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 include/linux/perf_event.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2e069d1..8646e33 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -870,4 +870,21 @@ _name##_show(struct device *dev,					\
 									\
 static struct device_attribute format_attr_##_name = __ATTR_RO(_name)
 
+#define PMU_RANGE_ATTR(name, attr_var, bit_start, bit_end)		\
+PMU_FORMAT_ATTR(name, #attr_var ":" #bit_start "-" #bit_end);		\
+PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)
+
+#define PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)		\
+static u64 event_get_##name##_max(void)					\
+{									\
+	int bits = (bit_end) - (bit_start) + 1;				\
+	return ((0x1ULL << (bits - 1ULL)) - 1ULL) |			\
+		(0xFULL << (bits - 4ULL));				\
+}									\
+static u64 event_get_##name(struct perf_event *event)			\
+{									\
+	return (event->attr.attr_var >> (bit_start)) &			\
+		event_get_##name##_max();				\
+}
+
 #endif /* _LINUX_PERF_EVENT_H */
-- 
1.8.5.2

^ permalink raw reply related

* [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer

These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
performance counters: gpci ("get performance counter info") and 24x7.

The counters supplied by these interfaces are continually counting and never
need to be (and cannot be) disabled or enabled. They additionally do not
generate any interrupts. This makes them in some regards similar to software
counters, and as a result their implimentation shares some common code (which
an initial patch exposes) with the sw counters.

There is ongoing work to support transactions for each of these pmus.

Cody P Schafer (8):
  perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
  perf core: export swevent hrtimer helpers
  powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
  powerpc: add hv_gpci interface header
  powerpc: add 24x7 interface header
  powerpc/perf: add support for the hv gpci (get performance counter
    info) interface
  powerpc/perf: add support for the hv 24x7 interface
  powerpc/perf: add kconfig option for hypervisor provided counters

 arch/powerpc/include/asm/hv_24x7.h     | 239 ++++++++++++++++
 arch/powerpc/include/asm/hv_gpci.h     | 490 +++++++++++++++++++++++++++++++++
 arch/powerpc/include/asm/hvcall.h      |   6 +-
 arch/powerpc/perf/Makefile             |   2 +
 arch/powerpc/perf/hv-24x7.c            | 354 ++++++++++++++++++++++++
 arch/powerpc/perf/hv-gpci.c            | 235 ++++++++++++++++
 arch/powerpc/platforms/Kconfig.cputype |   6 +
 include/linux/perf_event.h             |  22 +-
 kernel/events/core.c                   |   8 +-
 9 files changed, 1356 insertions(+), 6 deletions(-)
 create mode 100644 arch/powerpc/include/asm/hv_24x7.h
 create mode 100644 arch/powerpc/include/asm/hv_gpci.h
 create mode 100644 arch/powerpc/perf/hv-24x7.c
 create mode 100644 arch/powerpc/perf/hv-gpci.c

-- 
1.8.5.2

^ permalink raw reply

* Re: [PATCH RFC] powerpc/mpc85xx: add support for the kmp204x reference board
From: Scott Wood @ 2014-01-16 23:35 UTC (permalink / raw)
  To: Valentin Longchamp; +Cc: linuxppc-dev
In-Reply-To: <1389879525-27130-1-git-send-email-valentin.longchamp@keymile.com>

On Thu, 2014-01-16 at 14:38 +0100, Valentin Longchamp wrote:
> This patch introduces the support for Keymile's kmp204x reference
> design. This design is based on Freescale's P2040/P2041 SoC.
> 
> The peripherals used by this design are:
> - SPI NOR Flash as bootloader medium
> - NAND Flash with a ubi partition
> - 2 PCIe busses (hosts 1 and 3)
> - 3 FMAN Ethernet devices (FMAN1 DTSEC1/2/5)
> - 4 Local Bus windows, with one dedicated to the QRIO reset/power mgmt
>   FPGA
> - 2 HW I2C busses
> - last but not least, the mandatory serial port
> 
> The patch also adds a defconfig file for this reference design and a DTS
> file for the kmcoge4 board which is the first one based on this
> reference design.
> 
> To try to avoid code duplication, the support was added directly to the
> corenet_generic.c file.
> 
> Signed-off-by: Valentin Longchamp <valentin.longchamp@keymile.com>
> ---
>  arch/powerpc/boot/dts/kmcoge4.dts             | 165 ++++++++++++++++++
>  arch/powerpc/configs/85xx/kmp204x_defconfig   | 231 ++++++++++++++++++++++++++
>  arch/powerpc/platforms/85xx/Kconfig           |  14 ++
>  arch/powerpc/platforms/85xx/Makefile          |   1 +
>  arch/powerpc/platforms/85xx/corenet_generic.c |  52 ++++++
>  5 files changed, 463 insertions(+)
>  create mode 100644 arch/powerpc/boot/dts/kmcoge4.dts
>  create mode 100644 arch/powerpc/configs/85xx/kmp204x_defconfig
> 
> diff --git a/arch/powerpc/boot/dts/kmcoge4.dts b/arch/powerpc/boot/dts/kmcoge4.dts
> new file mode 100644
> index 0000000..c10df6d
> --- /dev/null
> +++ b/arch/powerpc/boot/dts/kmcoge4.dts
> @@ -0,0 +1,165 @@
> +/*
> + * Keymile kmcoge4 Device Tree Source, based on the P2041RDB DTS
> + *
> + * (C) Copyright 2014
> + * Valentin Longchamp, Keymile AG, valentin.longchamp@keymile.com
> + *
> + * Copyright 2011 Freescale Semiconductor Inc.
> + *
> + * This program is free software; you can redistribute  it and/or modify it
> + * under  the terms of  the GNU General  Public License as published by the
> + * Free Software Foundation;  either version 2 of the  License, or (at your
> + * option) any later version.
> + */
> +
> +/include/ "fsl/p2041si-pre.dtsi"
> +
> +/ {
> +	model = "keymile,kmcoge4";
> +	compatible = "keymile,kmp204x";

Don't put wildcards in compatible.

> +	soc: soc@ffe000000 {
> +		ranges = <0x00000000 0xf 0xfe000000 0x1000000>;
> +		reg = <0xf 0xfe000000 0 0x00001000>;
> +		spi@110000 {
> +			flash@0 {
> +				#address-cells = <1>;
> +				#size-cells = <1>;
> +				compatible = "spansion,s25fl256s1";
> +				reg = <0>;
> +				spi-max-frequency = <20000000>; /* input clock */
> +				partition@u-boot {
> +					label = "u-boot";
> +					reg = <0x00000000 0x00100000>;
> +					read-only;
> +				};
> +				partition@env {
> +					label = "env";
> +					reg = <0x00100000 0x00010000>;
> +				};
> +				partition@envred {
> +					label = "envred";
> +					reg = <0x00110000 0x00010000>;
> +				};
> +				partition@fman {
> +					label = "fman-ucode";
> +					reg = <0x00120000 0x00010000>;
> +					read-only;
> +				};
> +			};

I realize it's common practice, but it would be good to get away from
putting partition layouts in the dts file.  Alternatives include using
mtdparts on the command line, or having U-Boot put the partition info
into the dtb based on the mtdparts environment variable (there is
existing code for this).

> +			zl30343@1 {
> +				compatible = "gen,spidev";

Node names are supposed to be generic.  Compatibles are supposed to be
specific.

> +	lbc: localbus@ffe124000 {
> +		reg = <0xf 0xfe124000 0 0x1000>;
> +		ranges = <0 0 0xf 0xffa00000 0x00040000		/* LB 0 */
> +			  1 0 0xf 0xfb000000 0x00010000		/* LB 1 */
> +			  2 0 0xf 0xd0000000 0x10000000		/* LB 2 */
> +			  3 0 0xf 0xe0000000 0x10000000>;	/* LB 3 */
> +
> +		nand@0,0 {
> +			#address-cells = <1>;
> +			#size-cells = <1>;
> +			compatible = "fsl,elbc-fcm-nand";
> +			reg = <0 0 0x40000>;
> +
> +			partition@0 {
> +				label = "ubi0";
> +				reg = <0x0 0x8000000>;
> +			};
> +		};
> +	};

No nodes for those other chipselects?

> diff --git a/arch/powerpc/configs/85xx/kmp204x_defconfig b/arch/powerpc/configs/85xx/kmp204x_defconfig
> new file mode 100644
> index 0000000..3bbf4fa
> --- /dev/null
> +++ b/arch/powerpc/configs/85xx/kmp204x_defconfig

Why does this board need its own defconfig?

> diff --git a/arch/powerpc/platforms/85xx/corenet_generic.c b/arch/powerpc/platforms/85xx/corenet_generic.c
> index fbd871e..8e84e1c 100644
> --- a/arch/powerpc/platforms/85xx/corenet_generic.c
> +++ b/arch/powerpc/platforms/85xx/corenet_generic.c
> @@ -122,6 +122,7 @@ static const char * const hv_boards[] __initconst = {
>  	NULL
>  };
>  
> +#ifdef CONFIG_CORENET_GENERIC

corenet_generic.c without CONFIG_CORENET_GENERIC?

>  /*
>   * Called very early, device-tree isn't unflattened
>   */
> @@ -180,3 +181,54 @@ machine_arch_initcall(corenet_generic, corenet_gen_publish_devices);
>  #ifdef CONFIG_SWIOTLB
>  machine_arch_initcall(corenet_generic, swiotlb_setup_bus_notifier);
>  #endif
> +#endif
> +
> +#ifdef CONFIG_KMP204X
> +/*
> + * Called very early, device-tree isn't unflattened
> + */
> +static int __init kmp204x_generic_probe(void)
> +{
> +	unsigned long root = of_get_flat_dt_root();
> +
> +	return of_flat_dt_is_compatible(root, "keymile,kmp204x");
> +}
> +
> +
> +/*
> + * Setup the architecture
> + */
> +void __init kmp204x_gen_setup_arch(void)
> +{
> +	mpc85xx_smp_init();
> +
> +	swiotlb_detect_4g();
> +
> +	pr_info("%s platform from Keymile\n", ppc_md.name);
> +}
> +
> +define_machine(kmp204x) {
> +	.name			= "kmp204x",
> +	.probe			= kmp204x_generic_probe,
> +	.setup_arch		= kmp204x_gen_setup_arch,
> +	.init_IRQ		= corenet_gen_pic_init,
> +#ifdef CONFIG_PCI
> +	.pcibios_fixup_bus	= fsl_pcibios_fixup_bus,
> +#endif
> +	.get_irq		= mpic_get_coreint_irq,
> +	.restart		= fsl_rstcr_restart,
> +	.calibrate_decr		= generic_calibrate_decr,
> +	.progress		= udbg_progress,
> +#ifdef CONFIG_PPC64
> +	.power_save		= book3e_idle,
> +#else
> +	.power_save		= e500_idle,
> +#endif
> +};
> +
> +machine_arch_initcall(kmp204x, corenet_gen_publish_devices);
> +
> +#ifdef CONFIG_SWIOTLB
> +machine_arch_initcall(kmp204x, swiotlb_setup_bus_notifier);
> +#endif
> +#endif

The whole point of corenet_generic.c is to avoid duplicating all of this
stuff.

Can't you just use corenet_generic as-is other than adding the
compatible to boards[]?  If not, explain why and put it in a different
file.

-Scott

^ permalink raw reply

* Kernel stack overflows due to  "powerpc: Remove ksp_limit on ppc64" with v3.13-rc8 on ppc32 (P2020)
From: Guenter Roeck @ 2014-01-16 18:05 UTC (permalink / raw)
  To: linux-kernel, linuxppc-dev; +Cc: Linus Torvalds

Hi all,

I am getting kernel stack overflows with v3.13-rc8 on a system with P2020 CPU.
The kernel is patched for the target, but I don't think that is related.
Stack overflows are in different areas, but always in calls from __do_softirq.

Crashes happen reliably either during boot or if I put any kind of load
onto the system.

Example:

Kernel stack overflow in process eb3e5a00, r1=eb79df90
CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
task: eb3e5a00 ti: c0616000 task.ti: ef440000
NIP: c003a420 LR: c003a410 CTR: c0017518
REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
GPR08: 00000000 020b8000 00000000 00000000 44008442
NIP [c003a420] __do_softirq+0x94/0x1ec
LR [c003a410] __do_softirq+0x84/0x1ec
Call Trace:
[eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
[eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
[eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
[ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
[ef441f40] [c000e7f4] ret_from_except+0x0/0x18
--- Exception: 501 at 0xfcda524
    LR = 0x10024900
Instruction dump:
7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
Kernel panic - not syncing: kernel stack overflow
CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
Call Trace:
Rebooting in 180 seconds..

Reverting the following commit fixes the problem.

cbc9565ee8 "powerpc: Remove ksp_limit on ppc64"

Should I submit a patch reverting this commit, or is there a better way to fix
the problem on short notice (given that 3.13 is close) ?

Thanks,
Guenter

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox