[PATCH 0/8] Prepatory rework for multi-PMU support.

linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/8] Prepatory rework for multi-PMU support.
@ 2014-10-21 13:11 Mark Rutland
  2014-10-21 13:11 ` [PATCH 1/8] arm: perf: factor out callchain code Mark Rutland
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Mark Rutland @ 2014-10-21 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

This series performs inital preparation for big.LITTLE perf support. It
depends on the CCI rework posted previously [1], as the removal of
get_hw_events and the percpu rework is incompatible with the single
global set of events the CCI has.

As a result of this series:

* The callchain handling is split from the PMU driver, as with metag,
  powerpc, sh, and x86. This enables callchain handling for software
  events when the kernel is built without CONFIG_HW_PERF_EVENTS.

* Legacy MIDR-based PMU probing is factored into a table that a
  subsequent series will split into separate drivers. On its own this
  should not result in a behavioural change.

* The ARM PMU accounting structures are reorganised and absorbed by
  struct arm_pmu, which will make possible the allocation and management
  of multiple PMUs in a later patch series.

Thanks,
Mark.

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-October/295509.html

Mark Rutland (8):
  arm: perf: factor out callchain code
  arm: perf: make PMU probing data-driven
  arm: perf: use IDR types for CPU PMUs
  arm: perf: limit size of accounting data
  arm: perf: kill get_hw_events()
  arm: perf: fold percpu_pmu into pmu_hw_events
  arm: perf: dynamically allocate cpu hardware data
  arm: perf: fold hotplug notifer into arm_pmu

 arch/arm/include/asm/perf_event.h   |   2 +-
 arch/arm/include/asm/pmu.h          |  36 +++++++-
 arch/arm/kernel/Makefile            |   2 +-
 arch/arm/kernel/perf_callchain.c    | 137 ++++++++++++++++++++++++++++++
 arch/arm/kernel/perf_event.c        | 162 +++++-------------------------------
 arch/arm/kernel/perf_event_cpu.c    | 162 ++++++++++++++++--------------------
 arch/arm/kernel/perf_event_v6.c     |  12 +--
 arch/arm/kernel/perf_event_v7.c     |  14 ++--
 arch/arm/kernel/perf_event_xscale.c |  20 ++---
 9 files changed, 288 insertions(+), 259 deletions(-)
 create mode 100644 arch/arm/kernel/perf_callchain.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/8] arm: perf: factor out callchain code
  2014-10-21 13:11 [PATCH 0/8] Prepatory rework for multi-PMU support Mark Rutland
@ 2014-10-21 13:11 ` Mark Rutland
  2014-10-21 13:11 ` [PATCH 2/8] arm: perf: make PMU probing data-driven Mark Rutland
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mark Rutland @ 2014-10-21 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

The ARM callchain handling code is currently bundled with the ARM PMU
management code, despite the two having no dependency on each other.
This bundling has the unfortunate property of making callchain handling
depend on CONFIG_HW_PERF_EVENTS, even though the callchain handling
could be applied to software events in the absence of PMU hardware
support.

This patch separates the two, placing the callchain handling in
perf_callchain.c and making it depend on CONFIG_PERF_EVENTS rather than
CONFIG_HW_PERF_EVENTS, enabling callchain recording on kernels built
without hardware perf event support.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/perf_event.h |   2 +-
 arch/arm/kernel/Makefile          |   2 +-
 arch/arm/kernel/perf_callchain.c  | 137 ++++++++++++++++++++++++++++++++++++++
 arch/arm/kernel/perf_event.c      | 132 +-----------------------------------
 4 files changed, 140 insertions(+), 133 deletions(-)
 create mode 100644 arch/arm/kernel/perf_callchain.c

diff --git a/arch/arm/include/asm/perf_event.h b/arch/arm/include/asm/perf_event.h
index c3a8369..d9cf138 100644
--- a/arch/arm/include/asm/perf_event.h
+++ b/arch/arm/include/asm/perf_event.h
@@ -12,7 +12,7 @@
 #ifndef __ARM_PERF_EVENT_H__
 #define __ARM_PERF_EVENT_H__
 
-#ifdef CONFIG_HW_PERF_EVENTS
+#ifdef CONFIG_PERF_EVENTS
 struct pt_regs;
 extern unsigned long perf_instruction_pointer(struct pt_regs *regs);
 extern unsigned long perf_misc_flags(struct pt_regs *regs);
diff --git a/arch/arm/kernel/Makefile b/arch/arm/kernel/Makefile
index 38ddd9f..8dcbed5 100644
--- a/arch/arm/kernel/Makefile
+++ b/arch/arm/kernel/Makefile
@@ -82,7 +82,7 @@ obj-$(CONFIG_CPU_MOHAWK)	+= xscale-cp0.o
 obj-$(CONFIG_CPU_PJ4)		+= pj4-cp0.o
 obj-$(CONFIG_CPU_PJ4B)		+= pj4-cp0.o
 obj-$(CONFIG_IWMMXT)		+= iwmmxt.o
-obj-$(CONFIG_PERF_EVENTS)	+= perf_regs.o
+obj-$(CONFIG_PERF_EVENTS)	+= perf_regs.o perf_callchain.o
 obj-$(CONFIG_HW_PERF_EVENTS)	+= perf_event.o perf_event_cpu.o
 AFLAGS_iwmmxt.o			:= -Wa,-mcpu=iwmmxt
 obj-$(CONFIG_ARM_CPU_TOPOLOGY)  += topology.o
diff --git a/arch/arm/kernel/perf_callchain.c b/arch/arm/kernel/perf_callchain.c
new file mode 100644
index 0000000..ccadfc3
--- /dev/null
+++ b/arch/arm/kernel/perf_callchain.c
@@ -0,0 +1,137 @@
+/*
+ * ARM callchain support
+ *
+ * Copyright (C) 2009 picoChip Designs, Ltd., Jamie Iles
+ * Copyright (C) 2010 ARM Ltd., Will Deacon <will.deacon@arm.com>
+ *
+ * This code is based on the ARM OProfile backtrace code.
+ */
+#include <linux/perf_event.h>
+#include <linux/uaccess.h>
+
+#include <asm/stacktrace.h>
+
+/*
+ * The registers we're interested in are at the end of the variable
+ * length saved register structure. The fp points at the end of this
+ * structure so the address of this struct is:
+ * (struct frame_tail *)(xxx->fp)-1
+ *
+ * This code has been adapted from the ARM OProfile support.
+ */
+struct frame_tail {
+	struct frame_tail __user *fp;
+	unsigned long sp;
+	unsigned long lr;
+} __attribute__((packed));
+
+/*
+ * Get the return address for a single stackframe and return a pointer to the
+ * next frame tail.
+ */
+static struct frame_tail __user *
+user_backtrace(struct frame_tail __user *tail,
+	       struct perf_callchain_entry *entry)
+{
+	struct frame_tail buftail;
+	unsigned long err;
+
+	if (!access_ok(VERIFY_READ, tail, sizeof(buftail)))
+		return NULL;
+
+	pagefault_disable();
+	err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail));
+	pagefault_enable();
+
+	if (err)
+		return NULL;
+
+	perf_callchain_store(entry, buftail.lr);
+
+	/*
+	 * Frame pointers should strictly progress back up the stack
+	 * (towards higher addresses).
+	 */
+	if (tail + 1 >= buftail.fp)
+		return NULL;
+
+	return buftail.fp - 1;
+}
+
+void
+perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
+{
+	struct frame_tail __user *tail;
+
+	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+		/* We don't support guest os callchain now */
+		return;
+	}
+
+	perf_callchain_store(entry, regs->ARM_pc);
+
+	if (!current->mm)
+		return;
+
+	tail = (struct frame_tail __user *)regs->ARM_fp - 1;
+
+	while ((entry->nr < PERF_MAX_STACK_DEPTH) &&
+	       tail && !((unsigned long)tail & 0x3))
+		tail = user_backtrace(tail, entry);
+}
+
+/*
+ * Gets called by walk_stackframe() for every stackframe. This will be called
+ * whist unwinding the stackframe and is like a subroutine return so we use
+ * the PC.
+ */
+static int
+callchain_trace(struct stackframe *fr,
+		void *data)
+{
+	struct perf_callchain_entry *entry = data;
+	perf_callchain_store(entry, fr->pc);
+	return 0;
+}
+
+void
+perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
+{
+	struct stackframe fr;
+
+	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+		/* We don't support guest os callchain now */
+		return;
+	}
+
+	arm_get_current_stackframe(regs, &fr);
+	walk_stackframe(&fr, callchain_trace, entry);
+}
+
+unsigned long perf_instruction_pointer(struct pt_regs *regs)
+{
+	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
+		return perf_guest_cbs->get_guest_ip();
+
+	return instruction_pointer(regs);
+}
+
+unsigned long perf_misc_flags(struct pt_regs *regs)
+{
+	int misc = 0;
+
+	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
+		if (perf_guest_cbs->is_user_mode())
+			misc |= PERF_RECORD_MISC_GUEST_USER;
+		else
+			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
+	} else {
+		if (user_mode(regs))
+			misc |= PERF_RECORD_MISC_USER;
+		else
+			misc |= PERF_RECORD_MISC_KERNEL;
+	}
+
+	return misc;
+}
+
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 266cba4..ae96b98 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -7,21 +7,18 @@
  * Copyright (C) 2010 ARM Ltd., Will Deacon <will.deacon@arm.com>
  *
  * This code is based on the sparc64 perf event code, which is in turn based
- * on the x86 code. Callchain code is based on the ARM OProfile backtrace
- * code.
+ * on the x86 code.
  */
 #define pr_fmt(fmt) "hw perfevents: " fmt
 
 #include <linux/kernel.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
-#include <linux/uaccess.h>
 #include <linux/irq.h>
 #include <linux/irqdesc.h>
 
 #include <asm/irq_regs.h>
 #include <asm/pmu.h>
-#include <asm/stacktrace.h>
 
 static int
 armpmu_map_cache_event(const unsigned (*cache_map)
@@ -533,130 +530,3 @@ int armpmu_register(struct arm_pmu *armpmu, int type)
 	return perf_pmu_register(&armpmu->pmu, armpmu->name, type);
 }
 
-/*
- * Callchain handling code.
- */
-
-/*
- * The registers we're interested in are at the end of the variable
- * length saved register structure. The fp points at the end of this
- * structure so the address of this struct is:
- * (struct frame_tail *)(xxx->fp)-1
- *
- * This code has been adapted from the ARM OProfile support.
- */
-struct frame_tail {
-	struct frame_tail __user *fp;
-	unsigned long sp;
-	unsigned long lr;
-} __attribute__((packed));
-
-/*
- * Get the return address for a single stackframe and return a pointer to the
- * next frame tail.
- */
-static struct frame_tail __user *
-user_backtrace(struct frame_tail __user *tail,
-	       struct perf_callchain_entry *entry)
-{
-	struct frame_tail buftail;
-	unsigned long err;
-
-	if (!access_ok(VERIFY_READ, tail, sizeof(buftail)))
-		return NULL;
-
-	pagefault_disable();
-	err = __copy_from_user_inatomic(&buftail, tail, sizeof(buftail));
-	pagefault_enable();
-
-	if (err)
-		return NULL;
-
-	perf_callchain_store(entry, buftail.lr);
-
-	/*
-	 * Frame pointers should strictly progress back up the stack
-	 * (towards higher addresses).
-	 */
-	if (tail + 1 >= buftail.fp)
-		return NULL;
-
-	return buftail.fp - 1;
-}
-
-void
-perf_callchain_user(struct perf_callchain_entry *entry, struct pt_regs *regs)
-{
-	struct frame_tail __user *tail;
-
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-		/* We don't support guest os callchain now */
-		return;
-	}
-
-	perf_callchain_store(entry, regs->ARM_pc);
-
-	if (!current->mm)
-		return;
-
-	tail = (struct frame_tail __user *)regs->ARM_fp - 1;
-
-	while ((entry->nr < PERF_MAX_STACK_DEPTH) &&
-	       tail && !((unsigned long)tail & 0x3))
-		tail = user_backtrace(tail, entry);
-}
-
-/*
- * Gets called by walk_stackframe() for every stackframe. This will be called
- * whist unwinding the stackframe and is like a subroutine return so we use
- * the PC.
- */
-static int
-callchain_trace(struct stackframe *fr,
-		void *data)
-{
-	struct perf_callchain_entry *entry = data;
-	perf_callchain_store(entry, fr->pc);
-	return 0;
-}
-
-void
-perf_callchain_kernel(struct perf_callchain_entry *entry, struct pt_regs *regs)
-{
-	struct stackframe fr;
-
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-		/* We don't support guest os callchain now */
-		return;
-	}
-
-	arm_get_current_stackframe(regs, &fr);
-	walk_stackframe(&fr, callchain_trace, entry);
-}
-
-unsigned long perf_instruction_pointer(struct pt_regs *regs)
-{
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest())
-		return perf_guest_cbs->get_guest_ip();
-
-	return instruction_pointer(regs);
-}
-
-unsigned long perf_misc_flags(struct pt_regs *regs)
-{
-	int misc = 0;
-
-	if (perf_guest_cbs && perf_guest_cbs->is_in_guest()) {
-		if (perf_guest_cbs->is_user_mode())
-			misc |= PERF_RECORD_MISC_GUEST_USER;
-		else
-			misc |= PERF_RECORD_MISC_GUEST_KERNEL;
-	} else {
-		if (user_mode(regs))
-			misc |= PERF_RECORD_MISC_USER;
-		else
-			misc |= PERF_RECORD_MISC_KERNEL;
-	}
-
-	return misc;
-}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/8] arm: perf: make PMU probing data-driven
  2014-10-21 13:11 [PATCH 0/8] Prepatory rework for multi-PMU support Mark Rutland
  2014-10-21 13:11 ` [PATCH 1/8] arm: perf: factor out callchain code Mark Rutland
@ 2014-10-21 13:11 ` Mark Rutland
  2014-10-21 21:25   ` Stephen Boyd
  2014-10-21 13:11 ` [PATCH 3/8] arm: perf: use IDR types for CPU PMUs Mark Rutland
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2014-10-21 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

The current PMU probing logic consists of a single switch statement,
which means that the core arm_pmu core in perf_event_cpu.c needs to know
about every CPU PMU variant supported by a driver using the arm_pmu
framework. This makes it rather difficult to decouple the drivers from
the (otherwise generic) probing code.

The patch refactors that switch statement to a table-driven lookup,
separating the logic and knowledge (in the form of the table). Later
patches will split the table across the relevant PMU drivers, which can
pass their tables to the generic probing function.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/pmu.h       | 23 ++++++++++++++++++
 arch/arm/kernel/perf_event_cpu.c | 52 +++++++++++++++-------------------------
 2 files changed, 42 insertions(+), 33 deletions(-)

diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
index 0b648c5..ff39290 100644
--- a/arch/arm/include/asm/pmu.h
+++ b/arch/arm/include/asm/pmu.h
@@ -15,6 +15,8 @@
 #include <linux/interrupt.h>
 #include <linux/perf_event.h>
 
+#include <asm/cputype.h>
+
 /*
  * struct arm_pmu_platdata - ARM PMU platform data
  *
@@ -127,6 +129,27 @@ int armpmu_map_event(struct perf_event *event,
 						[PERF_COUNT_HW_CACHE_RESULT_MAX],
 		     u32 raw_event_mask);
 
+struct pmu_probe_info {
+	unsigned int cpuid;
+	unsigned int mask;
+	int (*init)(struct arm_pmu *);
+};
+
+#define PMU_PROBE(_cpuid, _mask, _fn)	\
+{					\
+	.cpuid = (_cpuid),		\
+	.mask = (_mask),		\
+	.init = (_fn),			\
+}
+
+#define ARM_PMU_PROBE(_cpuid, _fn) \
+	PMU_PROBE(_cpuid, ARM_CPU_PART_MASK, _fn)
+
+#define ARM_PMU_XSCALE_MASK	((0xff << 24) | ARM_CPU_XSCALE_ARCH_MASK)
+
+#define XSCALE_PMU_PROBE(_version, _fn) \
+	PMU_PROBE(ARM_CPU_IMP_INTEL << 24 | _version, ARM_PMU_XSCALE_MASK, _fn)
+
 #endif /* CONFIG_HW_PERF_EVENTS */
 
 #endif /* __ARM_PMU_H__ */
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index 4bf4cce..79c1cf4 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -241,48 +241,34 @@ static struct platform_device_id cpu_pmu_plat_device_ids[] = {
 	{},
 };
 
+static struct pmu_probe_info pmu_probe_table[] = {
+	ARM_PMU_PROBE(ARM_CPU_PART_ARM1136, armv6_1136_pmu_init),
+	ARM_PMU_PROBE(ARM_CPU_PART_ARM1156, armv6_1156_pmu_init),
+	ARM_PMU_PROBE(ARM_CPU_PART_ARM1176, armv6_1176_pmu_init),
+	ARM_PMU_PROBE(ARM_CPU_PART_ARM11MPCORE, armv6mpcore_pmu_init),
+	ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A8, armv7_a8_pmu_init),
+	ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A9, armv7_a9_pmu_init),
+	XSCALE_PMU_PROBE(ARM_CPU_XSCALE_ARCH_V1, xscale1pmu_init),
+	XSCALE_PMU_PROBE(ARM_CPU_XSCALE_ARCH_V2, xscale2pmu_init),
+	{ /* sentinel value */ }
+};
+
 /*
  * CPU PMU identification and probing.
  */
 static int probe_current_pmu(struct arm_pmu *pmu)
 {
 	int cpu = get_cpu();
+	unsigned int cpuid = read_cpuid_id();
 	int ret = -ENODEV;
+	struct pmu_probe_info *info;
 
-	pr_info("probing PMU on CPU %d\n", cpu);
+	pr_info("probing PMU on CPU %d", cpu);
 
-	switch (read_cpuid_part()) {
-	/* ARM Ltd CPUs. */
-	case ARM_CPU_PART_ARM1136:
-		ret = armv6_1136_pmu_init(pmu);
-		break;
-	case ARM_CPU_PART_ARM1156:
-		ret = armv6_1156_pmu_init(pmu);
-		break;
-	case ARM_CPU_PART_ARM1176:
-		ret = armv6_1176_pmu_init(pmu);
-		break;
-	case ARM_CPU_PART_ARM11MPCORE:
-		ret = armv6mpcore_pmu_init(pmu);
-		break;
-	case ARM_CPU_PART_CORTEX_A8:
-		ret = armv7_a8_pmu_init(pmu);
-		break;
-	case ARM_CPU_PART_CORTEX_A9:
-		ret = armv7_a9_pmu_init(pmu);
-		break;
-
-	default:
-		if (read_cpuid_implementor() == ARM_CPU_IMP_INTEL) {
-			switch (xscale_cpu_arch_version()) {
-			case ARM_CPU_XSCALE_ARCH_V1:
-				ret = xscale1pmu_init(pmu);
-				break;
-			case ARM_CPU_XSCALE_ARCH_V2:
-				ret = xscale2pmu_init(pmu);
-				break;
-			}
-		}
+	for (info = pmu_probe_table; info->init != NULL; info++) {
+		if ((cpuid & info->mask) != info->cpuid)
+			continue;
+		ret = info->init(pmu);
 		break;
 	}
 
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/8] arm: perf: use IDR types for CPU PMUs
  2014-10-21 13:11 [PATCH 0/8] Prepatory rework for multi-PMU support Mark Rutland
  2014-10-21 13:11 ` [PATCH 1/8] arm: perf: factor out callchain code Mark Rutland
  2014-10-21 13:11 ` [PATCH 2/8] arm: perf: make PMU probing data-driven Mark Rutland
@ 2014-10-21 13:11 ` Mark Rutland
  2014-10-21 21:25   ` Stephen Boyd
  2014-10-21 13:11 ` [PATCH 4/8] arm: perf: limit size of accounting data Mark Rutland
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2014-10-21 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

For systems with heterogeneous CPUs (e.g. big.LITTLE systems) the PMUs
can be different in each cluster, and not all events can be migrated
between clusters. To allow userspace to deal with this, it must be
possible to address each PMU independently.

This patch changes PMUs to be registered with dynamic (IDR) types,
allowing them to be targeted individually. Each PMU's type can be found
in ${SYSFS_ROOT}/bus/event_source/devices/${PMU_NAME}/type.

>From userspace, raw events can be targeted at a specific PMU:
$ perf stat -e ${PMU_NAME}/config=V,config1=V1,.../

Doing this does not break existing tools which use existing perf types:
when perf core can't find a PMU of matching type (in perf_init_event)
it'll iterate over the set of all PMUs. If a compatible PMU exists,
it'll be found eventually. If more than one compatible PMU exists, the
event will be handled by whichever PMU happens to be earlier in the pmus
list (which currently will be the last compatible PMU registered).

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/kernel/perf_event.c     | 6 +++++-
 arch/arm/kernel/perf_event_cpu.c | 2 +-
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index ae96b98..f0bbd3d 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -77,8 +77,12 @@ armpmu_map_event(struct perf_event *event,
 		 u32 raw_event_mask)
 {
 	u64 config = event->attr.config;
+	int type = event->attr.type;
 
-	switch (event->attr.type) {
+	if (type >= PERF_TYPE_MAX && type == event->pmu->type)
+		return armpmu_map_raw_event(raw_event_mask, config);
+
+	switch (type) {
 	case PERF_TYPE_HARDWARE:
 		return armpmu_map_hw_event(event_map, config);
 	case PERF_TYPE_HW_CACHE:
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index 79c1cf4..ed7bb04f 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -311,7 +311,7 @@ static int cpu_pmu_device_probe(struct platform_device *pdev)
 	}
 
 	cpu_pmu_init(cpu_pmu);
-	ret = armpmu_register(cpu_pmu, PERF_TYPE_RAW);
+	ret = armpmu_register(cpu_pmu, -1);
 
 	if (!ret)
 		return 0;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/8] arm: perf: limit size of accounting data
  2014-10-21 13:11 [PATCH 0/8] Prepatory rework for multi-PMU support Mark Rutland
                   ` (2 preceding siblings ...)
  2014-10-21 13:11 ` [PATCH 3/8] arm: perf: use IDR types for CPU PMUs Mark Rutland
@ 2014-10-21 13:11 ` Mark Rutland
  2014-10-21 13:11 ` [PATCH 5/8] arm: perf: kill get_hw_events() Mark Rutland
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mark Rutland @ 2014-10-21 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

Commit 3fc2c83087 (ARM: perf: remove event limit from pmu_hw_events) got
rid of the upper limit on the number of events an arm_pmu could handle,
but introduced additional complexity and places a burden on each PMU
driver to allocate accounting data somehow. So far this has not
generally been useful as the only users of arm_pmu are the CPU backend
and the CCI driver.

Now that the CCI driver plugs into the perf subsystem directly, we can
remove some of the complexities that get in the way of supporting
heterogeneous CPU PMUs.

This patch restores the original limits on pmu_hw_events fields such
that the pmu_hw_events data can be allocated as a contiguous block. This
will simplify dynamic pmu_hw_events allocation in later patches.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/pmu.h       | 4 ++--
 arch/arm/kernel/perf_event.c     | 4 +---
 arch/arm/kernel/perf_event_cpu.c | 4 ----
 3 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
index ff39290..3d7e30b 100644
--- a/arch/arm/include/asm/pmu.h
+++ b/arch/arm/include/asm/pmu.h
@@ -68,13 +68,13 @@ struct pmu_hw_events {
 	/*
 	 * The events that are active on the PMU for the given index.
 	 */
-	struct perf_event	**events;
+	struct perf_event	*events[ARMPMU_MAX_HWEVENTS];
 
 	/*
 	 * A 1 bit for an index indicates that the counter is being used for
 	 * an event. A 0 means that the counter can be used.
 	 */
-	unsigned long           *used_mask;
+	DECLARE_BITMAP(used_mask, ARMPMU_MAX_HWEVENTS);
 
 	/*
 	 * Hardware lock to serialize accesses to PMU registers. Needed for the
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index f0bbd3d..01db868 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -275,14 +275,12 @@ validate_group(struct perf_event *event)
 {
 	struct perf_event *sibling, *leader = event->group_leader;
 	struct pmu_hw_events fake_pmu;
-	DECLARE_BITMAP(fake_used_mask, ARMPMU_MAX_HWEVENTS);
 
 	/*
 	 * Initialise the fake PMU. We only need to populate the
 	 * used_mask for the purposes of validation.
 	 */
-	memset(fake_used_mask, 0, sizeof(fake_used_mask));
-	fake_pmu.used_mask = fake_used_mask;
+	memset(&fake_pmu.used_mask, 0, sizeof(fake_pmu.used_mask));
 
 	if (!validate_event(&fake_pmu, leader))
 		return -EINVAL;
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index ed7bb04f..ff4a47f 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -36,8 +36,6 @@
 static struct arm_pmu *cpu_pmu;
 
 static DEFINE_PER_CPU(struct arm_pmu *, percpu_pmu);
-static DEFINE_PER_CPU(struct perf_event * [ARMPMU_MAX_HWEVENTS], hw_events);
-static DEFINE_PER_CPU(unsigned long [BITS_TO_LONGS(ARMPMU_MAX_HWEVENTS)], used_mask);
 static DEFINE_PER_CPU(struct pmu_hw_events, cpu_hw_events);
 
 /*
@@ -172,8 +170,6 @@ static void cpu_pmu_init(struct arm_pmu *cpu_pmu)
 	int cpu;
 	for_each_possible_cpu(cpu) {
 		struct pmu_hw_events *events = &per_cpu(cpu_hw_events, cpu);
-		events->events = per_cpu(hw_events, cpu);
-		events->used_mask = per_cpu(used_mask, cpu);
 		raw_spin_lock_init(&events->pmu_lock);
 		per_cpu(percpu_pmu, cpu) = cpu_pmu;
 	}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/8] arm: perf: kill get_hw_events()
  2014-10-21 13:11 [PATCH 0/8] Prepatory rework for multi-PMU support Mark Rutland
                   ` (3 preceding siblings ...)
  2014-10-21 13:11 ` [PATCH 4/8] arm: perf: limit size of accounting data Mark Rutland
@ 2014-10-21 13:11 ` Mark Rutland
  2014-10-21 13:11 ` [PATCH 6/8] arm: perf: fold percpu_pmu into pmu_hw_events Mark Rutland
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 19+ messages in thread
From: Mark Rutland @ 2014-10-21 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

Now that the arm pmu code is limited to CPU PMUs the get_hw_events()
function is superfluous, as we'll always have a set of per-cpu
pmu_hw_events structures.

This patch removes the get_hw_events() function, replacing it with
a percpu hw_events pointer. Uses of get_hw_events are updated to use
this_cpu_ptr.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/pmu.h          |  2 +-
 arch/arm/kernel/perf_event.c        |  6 +++---
 arch/arm/kernel/perf_event_cpu.c    |  7 +------
 arch/arm/kernel/perf_event_v6.c     | 12 ++++++------
 arch/arm/kernel/perf_event_v7.c     | 14 +++++++-------
 arch/arm/kernel/perf_event_xscale.c | 20 ++++++++++----------
 6 files changed, 28 insertions(+), 33 deletions(-)

diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
index 3d7e30b..f273dd2 100644
--- a/arch/arm/include/asm/pmu.h
+++ b/arch/arm/include/asm/pmu.h
@@ -109,7 +109,7 @@ struct arm_pmu {
 	struct mutex	reserve_mutex;
 	u64		max_period;
 	struct platform_device	*plat_device;
-	struct pmu_hw_events	*(*get_hw_events)(void);
+	struct pmu_hw_events	__percpu *hw_events;
 };
 
 #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 01db868..6bfa229 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -201,7 +201,7 @@ static void
 armpmu_del(struct perf_event *event, int flags)
 {
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
-	struct pmu_hw_events *hw_events = armpmu->get_hw_events();
+	struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
@@ -218,7 +218,7 @@ static int
 armpmu_add(struct perf_event *event, int flags)
 {
 	struct arm_pmu *armpmu = to_arm_pmu(event->pmu);
-	struct pmu_hw_events *hw_events = armpmu->get_hw_events();
+	struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
 	struct hw_perf_event *hwc = &event->hw;
 	int idx;
 	int err = 0;
@@ -467,7 +467,7 @@ static int armpmu_event_init(struct perf_event *event)
 static void armpmu_enable(struct pmu *pmu)
 {
 	struct arm_pmu *armpmu = to_arm_pmu(pmu);
-	struct pmu_hw_events *hw_events = armpmu->get_hw_events();
+	struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
 	int enabled = bitmap_weight(hw_events->used_mask, armpmu->num_events);
 
 	if (enabled)
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index ff4a47f..13fab83 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -67,11 +67,6 @@ EXPORT_SYMBOL_GPL(perf_num_counters);
 #include "perf_event_v6.c"
 #include "perf_event_v7.c"
 
-static struct pmu_hw_events *cpu_pmu_get_cpu_events(void)
-{
-	return this_cpu_ptr(&cpu_hw_events);
-}
-
 static void cpu_pmu_enable_percpu_irq(void *data)
 {
 	int irq = *(int *)data;
@@ -174,7 +169,7 @@ static void cpu_pmu_init(struct arm_pmu *cpu_pmu)
 		per_cpu(percpu_pmu, cpu) = cpu_pmu;
 	}
 
-	cpu_pmu->get_hw_events	= cpu_pmu_get_cpu_events;
+	cpu_pmu->hw_events	= &cpu_hw_events;
 	cpu_pmu->request_irq	= cpu_pmu_request_irq;
 	cpu_pmu->free_irq	= cpu_pmu_free_irq;
 
diff --git a/arch/arm/kernel/perf_event_v6.c b/arch/arm/kernel/perf_event_v6.c
index abfeb04..f2ffd5c 100644
--- a/arch/arm/kernel/perf_event_v6.c
+++ b/arch/arm/kernel/perf_event_v6.c
@@ -262,7 +262,7 @@ static void armv6pmu_enable_event(struct perf_event *event)
 	unsigned long val, mask, evt, flags;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 	int idx = hwc->idx;
 
 	if (ARMV6_CYCLE_COUNTER == idx) {
@@ -300,7 +300,7 @@ armv6pmu_handle_irq(int irq_num,
 	unsigned long pmcr = armv6_pmcr_read();
 	struct perf_sample_data data;
 	struct arm_pmu *cpu_pmu = (struct arm_pmu *)dev;
-	struct pmu_hw_events *cpuc = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
 	struct pt_regs *regs;
 	int idx;
 
@@ -356,7 +356,7 @@ armv6pmu_handle_irq(int irq_num,
 static void armv6pmu_start(struct arm_pmu *cpu_pmu)
 {
 	unsigned long flags, val;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
 	val = armv6_pmcr_read();
@@ -368,7 +368,7 @@ static void armv6pmu_start(struct arm_pmu *cpu_pmu)
 static void armv6pmu_stop(struct arm_pmu *cpu_pmu)
 {
 	unsigned long flags, val;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
 	val = armv6_pmcr_read();
@@ -409,7 +409,7 @@ static void armv6pmu_disable_event(struct perf_event *event)
 	unsigned long val, mask, evt, flags;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 	int idx = hwc->idx;
 
 	if (ARMV6_CYCLE_COUNTER == idx) {
@@ -444,7 +444,7 @@ static void armv6mpcore_pmu_disable_event(struct perf_event *event)
 	unsigned long val, mask, flags, evt = 0;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 	int idx = hwc->idx;
 
 	if (ARMV6_CYCLE_COUNTER == idx) {
diff --git a/arch/arm/kernel/perf_event_v7.c b/arch/arm/kernel/perf_event_v7.c
index 116758b..fa76b25 100644
--- a/arch/arm/kernel/perf_event_v7.c
+++ b/arch/arm/kernel/perf_event_v7.c
@@ -705,7 +705,7 @@ static void armv7pmu_enable_event(struct perf_event *event)
 	unsigned long flags;
 	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 	int idx = hwc->idx;
 
 	if (!armv7_pmnc_counter_valid(cpu_pmu, idx)) {
@@ -751,7 +751,7 @@ static void armv7pmu_disable_event(struct perf_event *event)
 	unsigned long flags;
 	struct hw_perf_event *hwc = &event->hw;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 	int idx = hwc->idx;
 
 	if (!armv7_pmnc_counter_valid(cpu_pmu, idx)) {
@@ -783,7 +783,7 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev)
 	u32 pmnc;
 	struct perf_sample_data data;
 	struct arm_pmu *cpu_pmu = (struct arm_pmu *)dev;
-	struct pmu_hw_events *cpuc = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
 	struct pt_regs *regs;
 	int idx;
 
@@ -843,7 +843,7 @@ static irqreturn_t armv7pmu_handle_irq(int irq_num, void *dev)
 static void armv7pmu_start(struct arm_pmu *cpu_pmu)
 {
 	unsigned long flags;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
 	/* Enable all counters */
@@ -854,7 +854,7 @@ static void armv7pmu_start(struct arm_pmu *cpu_pmu)
 static void armv7pmu_stop(struct arm_pmu *cpu_pmu)
 {
 	unsigned long flags;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
 	/* Disable all counters */
@@ -1287,7 +1287,7 @@ static void krait_pmu_disable_event(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 
 	/* Disable counter and interrupt */
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
@@ -1313,7 +1313,7 @@ static void krait_pmu_enable_event(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 
 	/*
 	 * Enable counter and interrupt, and set the counter to count
diff --git a/arch/arm/kernel/perf_event_xscale.c b/arch/arm/kernel/perf_event_xscale.c
index 08da0af..8af9f1f 100644
--- a/arch/arm/kernel/perf_event_xscale.c
+++ b/arch/arm/kernel/perf_event_xscale.c
@@ -138,7 +138,7 @@ xscale1pmu_handle_irq(int irq_num, void *dev)
 	unsigned long pmnc;
 	struct perf_sample_data data;
 	struct arm_pmu *cpu_pmu = (struct arm_pmu *)dev;
-	struct pmu_hw_events *cpuc = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
 	struct pt_regs *regs;
 	int idx;
 
@@ -198,7 +198,7 @@ static void xscale1pmu_enable_event(struct perf_event *event)
 	unsigned long val, mask, evt, flags;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 	int idx = hwc->idx;
 
 	switch (idx) {
@@ -234,7 +234,7 @@ static void xscale1pmu_disable_event(struct perf_event *event)
 	unsigned long val, mask, evt, flags;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 	int idx = hwc->idx;
 
 	switch (idx) {
@@ -287,7 +287,7 @@ xscale1pmu_get_event_idx(struct pmu_hw_events *cpuc,
 static void xscale1pmu_start(struct arm_pmu *cpu_pmu)
 {
 	unsigned long flags, val;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
 	val = xscale1pmu_read_pmnc();
@@ -299,7 +299,7 @@ static void xscale1pmu_start(struct arm_pmu *cpu_pmu)
 static void xscale1pmu_stop(struct arm_pmu *cpu_pmu)
 {
 	unsigned long flags, val;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
 	val = xscale1pmu_read_pmnc();
@@ -485,7 +485,7 @@ xscale2pmu_handle_irq(int irq_num, void *dev)
 	unsigned long pmnc, of_flags;
 	struct perf_sample_data data;
 	struct arm_pmu *cpu_pmu = (struct arm_pmu *)dev;
-	struct pmu_hw_events *cpuc = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events);
 	struct pt_regs *regs;
 	int idx;
 
@@ -539,7 +539,7 @@ static void xscale2pmu_enable_event(struct perf_event *event)
 	unsigned long flags, ien, evtsel;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 	int idx = hwc->idx;
 
 	ien = xscale2pmu_read_int_enable();
@@ -585,7 +585,7 @@ static void xscale2pmu_disable_event(struct perf_event *event)
 	unsigned long flags, ien, evtsel, of_flags;
 	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 	struct hw_perf_event *hwc = &event->hw;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 	int idx = hwc->idx;
 
 	ien = xscale2pmu_read_int_enable();
@@ -651,7 +651,7 @@ out:
 static void xscale2pmu_start(struct arm_pmu *cpu_pmu)
 {
 	unsigned long flags, val;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
 	val = xscale2pmu_read_pmnc() & ~XSCALE_PMU_CNT64;
@@ -663,7 +663,7 @@ static void xscale2pmu_start(struct arm_pmu *cpu_pmu)
 static void xscale2pmu_stop(struct arm_pmu *cpu_pmu)
 {
 	unsigned long flags, val;
-	struct pmu_hw_events *events = cpu_pmu->get_hw_events();
+	struct pmu_hw_events *events = this_cpu_ptr(cpu_pmu->hw_events);
 
 	raw_spin_lock_irqsave(&events->pmu_lock, flags);
 	val = xscale2pmu_read_pmnc();
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/8] arm: perf: fold percpu_pmu into pmu_hw_events
  2014-10-21 13:11 [PATCH 0/8] Prepatory rework for multi-PMU support Mark Rutland
                   ` (4 preceding siblings ...)
  2014-10-21 13:11 ` [PATCH 5/8] arm: perf: kill get_hw_events() Mark Rutland
@ 2014-10-21 13:11 ` Mark Rutland
  2014-10-21 22:05   ` Stephen Boyd
  2014-10-21 13:11 ` [PATCH 7/8] arm: perf: dynamically allocate cpu hardware data Mark Rutland
  2014-10-21 13:11 ` [PATCH 8/8] arm: perf: fold hotplug notifier into arm_pmu Mark Rutland
  7 siblings, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2014-10-21 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

Currently the percpu_pmu pointers used as percpu_irq dev_id values are
defined separately from the other per-cpu accounting data, which make
dynamically allocating the data (as will be required for systems with
heterogeneous CPUs) difficult.

This patch moves the percpu_pmu pointers into pmu_hw_events (which is
itself allocated per cpu), which will allow for easier dynamic
allocation. Both percpu and regular irqs are requested using percpu_pmu
pointers as tokens, freeing us from having to know whether an irq is
percpu within the handler, and thus avoiding a radix tree lookup on the
handler path.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
---
 arch/arm/include/asm/pmu.h       |  6 ++++++
 arch/arm/kernel/perf_event.c     | 14 +++++++++-----
 arch/arm/kernel/perf_event_cpu.c | 14 ++++++++------
 3 files changed, 23 insertions(+), 11 deletions(-)

diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
index f273dd2..cc01498 100644
--- a/arch/arm/include/asm/pmu.h
+++ b/arch/arm/include/asm/pmu.h
@@ -81,6 +81,12 @@ struct pmu_hw_events {
 	 * read/modify/write sequences.
 	 */
 	raw_spinlock_t		pmu_lock;
+
+	/*
+	 * When using percpu IRQs, we need a percpu dev_id. Place it here as we
+	 * already have to allocate this struct per cpu.
+	 */
+	struct arm_pmu		*percpu_pmu;
 };
 
 struct arm_pmu {
diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
index 6bfa229..70ea8bd 100644
--- a/arch/arm/kernel/perf_event.c
+++ b/arch/arm/kernel/perf_event.c
@@ -304,17 +304,21 @@ static irqreturn_t armpmu_dispatch_irq(int irq, void *dev)
 	int ret;
 	u64 start_clock, finish_clock;
 
-	if (irq_is_percpu(irq))
-		dev = *(void **)dev;
-	armpmu = dev;
+	/*
+	 * we request the IRQ with a (possibly percpu) struct arm_pmu**, but
+	 * the handlers expect a struct arm_pmu*. The percpu_irq framework will
+	 * do any necessary shifting, we just need to perform the first
+	 * dereference.
+	 */
+	armpmu = *(void **)dev;
 	plat_device = armpmu->plat_device;
 	plat = dev_get_platdata(&plat_device->dev);
 
 	start_clock = sched_clock();
 	if (plat && plat->handle_irq)
-		ret = plat->handle_irq(irq, dev, armpmu->handle_irq);
+		ret = plat->handle_irq(irq, armpmu, armpmu->handle_irq);
 	else
-		ret = armpmu->handle_irq(irq, dev);
+		ret = armpmu->handle_irq(irq, armpmu);
 	finish_clock = sched_clock();
 
 	perf_sample_event_took(finish_clock - start_clock);
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index 13fab83..cd95388 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -35,7 +35,6 @@
 /* Set at runtime when we know what CPU type we are. */
 static struct arm_pmu *cpu_pmu;
 
-static DEFINE_PER_CPU(struct arm_pmu *, percpu_pmu);
 static DEFINE_PER_CPU(struct pmu_hw_events, cpu_hw_events);
 
 /*
@@ -85,20 +84,21 @@ static void cpu_pmu_free_irq(struct arm_pmu *cpu_pmu)
 {
 	int i, irq, irqs;
 	struct platform_device *pmu_device = cpu_pmu->plat_device;
+	struct pmu_hw_events __percpu *hw_events = cpu_pmu->hw_events;
 
 	irqs = min(pmu_device->num_resources, num_possible_cpus());
 
 	irq = platform_get_irq(pmu_device, 0);
 	if (irq >= 0 && irq_is_percpu(irq)) {
 		on_each_cpu(cpu_pmu_disable_percpu_irq, &irq, 1);
-		free_percpu_irq(irq, &percpu_pmu);
+		free_percpu_irq(irq, &hw_events->percpu_pmu);
 	} else {
 		for (i = 0; i < irqs; ++i) {
 			if (!cpumask_test_and_clear_cpu(i, &cpu_pmu->active_irqs))
 				continue;
 			irq = platform_get_irq(pmu_device, i);
 			if (irq >= 0)
-				free_irq(irq, cpu_pmu);
+				free_irq(irq, per_cpu_ptr(&hw_events->percpu_pmu, i));
 		}
 	}
 }
@@ -107,6 +107,7 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
 {
 	int i, err, irq, irqs;
 	struct platform_device *pmu_device = cpu_pmu->plat_device;
+	struct pmu_hw_events __percpu *hw_events = cpu_pmu->hw_events;
 
 	if (!pmu_device)
 		return -ENODEV;
@@ -119,7 +120,8 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
 
 	irq = platform_get_irq(pmu_device, 0);
 	if (irq >= 0 && irq_is_percpu(irq)) {
-		err = request_percpu_irq(irq, handler, "arm-pmu", &percpu_pmu);
+		err = request_percpu_irq(irq, handler, "arm-pmu",
+					 &hw_events->percpu_pmu);
 		if (err) {
 			pr_err("unable to request IRQ%d for ARM PMU counters\n",
 				irq);
@@ -146,7 +148,7 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
 
 			err = request_irq(irq, handler,
 					  IRQF_NOBALANCING | IRQF_NO_THREAD, "arm-pmu",
-					  cpu_pmu);
+					  per_cpu_ptr(&hw_events->percpu_pmu, i));
 			if (err) {
 				pr_err("unable to request IRQ%d for ARM PMU counters\n",
 					irq);
@@ -166,7 +168,7 @@ static void cpu_pmu_init(struct arm_pmu *cpu_pmu)
 	for_each_possible_cpu(cpu) {
 		struct pmu_hw_events *events = &per_cpu(cpu_hw_events, cpu);
 		raw_spin_lock_init(&events->pmu_lock);
-		per_cpu(percpu_pmu, cpu) = cpu_pmu;
+		events->percpu_pmu = cpu_pmu;
 	}
 
 	cpu_pmu->hw_events	= &cpu_hw_events;
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/8] arm: perf: dynamically allocate cpu hardware data
  2014-10-21 13:11 [PATCH 0/8] Prepatory rework for multi-PMU support Mark Rutland
                   ` (5 preceding siblings ...)
  2014-10-21 13:11 ` [PATCH 6/8] arm: perf: fold percpu_pmu into pmu_hw_events Mark Rutland
@ 2014-10-21 13:11 ` Mark Rutland
  2014-10-21 21:24   ` Stephen Boyd
  2014-10-21 13:11 ` [PATCH 8/8] arm: perf: fold hotplug notifier into arm_pmu Mark Rutland
  7 siblings, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2014-10-21 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

To support multiple PMUs, each PMU will need its own accounting data.
As we don't know how (in general) many PMUs we'll have to support at
compile-time, we must allocate the data at runtime dynamically

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/kernel/perf_event_cpu.c | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index cd95388..f8a237d 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -35,8 +35,6 @@
 /* Set at runtime when we know what CPU type we are. */
 static struct arm_pmu *cpu_pmu;
 
-static DEFINE_PER_CPU(struct pmu_hw_events, cpu_hw_events);
-
 /*
  * Despite the names, these two functions are CPU-specific and are used
  * by the OProfile/perf code.
@@ -162,16 +160,22 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
 	return 0;
 }
 
-static void cpu_pmu_init(struct arm_pmu *cpu_pmu)
+static int cpu_pmu_init(struct arm_pmu *cpu_pmu)
 {
 	int cpu;
+	struct pmu_hw_events __percpu *cpu_hw_events;
+
+	cpu_hw_events = alloc_percpu(struct pmu_hw_events);
+	if (!cpu_hw_events)
+		return -ENOMEM;
+
 	for_each_possible_cpu(cpu) {
-		struct pmu_hw_events *events = &per_cpu(cpu_hw_events, cpu);
+		struct pmu_hw_events *events = per_cpu_ptr(cpu_hw_events, cpu);
 		raw_spin_lock_init(&events->pmu_lock);
 		events->percpu_pmu = cpu_pmu;
 	}
 
-	cpu_pmu->hw_events	= &cpu_hw_events;
+	cpu_pmu->hw_events	= cpu_hw_events;
 	cpu_pmu->request_irq	= cpu_pmu_request_irq;
 	cpu_pmu->free_irq	= cpu_pmu_free_irq;
 
@@ -182,6 +186,8 @@ static void cpu_pmu_init(struct arm_pmu *cpu_pmu)
 	/* If no interrupts available, set the corresponding capability flag */
 	if (!platform_get_irq(cpu_pmu->plat_device, 0))
 		cpu_pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
+
+	return 0;
 }
 
 /*
@@ -303,7 +309,10 @@ static int cpu_pmu_device_probe(struct platform_device *pdev)
 		goto out_free;
 	}
 
-	cpu_pmu_init(cpu_pmu);
+	ret = cpu_pmu_init(cpu_pmu);
+	if (ret)
+		goto out_free;
+
 	ret = armpmu_register(cpu_pmu, -1);
 
 	if (!ret)
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 8/8] arm: perf: fold hotplug notifier into arm_pmu
  2014-10-21 13:11 [PATCH 0/8] Prepatory rework for multi-PMU support Mark Rutland
                   ` (6 preceding siblings ...)
  2014-10-21 13:11 ` [PATCH 7/8] arm: perf: dynamically allocate cpu hardware data Mark Rutland
@ 2014-10-21 13:11 ` Mark Rutland
  2014-10-21 22:18   ` Stephen Boyd
  7 siblings, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2014-10-21 13:11 UTC (permalink / raw)
  To: linux-arm-kernel

Handling multiple PMUs using a single hotplug notifier requires a list
of PMUs to be maintained, with synchronisation in the probe, remove, and
notify paths. This is error-prone and makes the code much harder to
maintain.

Instead of using a single notifier, we can dynamically allocate a
notifier block per-PMU. The end result is the same, but the list of PMUs
is implicit in the hotplug notifier list rather than within a perf-local
data structure, which makes the code far easier to handle.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm/include/asm/pmu.h       |  1 +
 arch/arm/kernel/perf_event_cpu.c | 66 +++++++++++++++++++---------------------
 2 files changed, 33 insertions(+), 34 deletions(-)

diff --git a/arch/arm/include/asm/pmu.h b/arch/arm/include/asm/pmu.h
index cc01498..b1596bd 100644
--- a/arch/arm/include/asm/pmu.h
+++ b/arch/arm/include/asm/pmu.h
@@ -116,6 +116,7 @@ struct arm_pmu {
 	u64		max_period;
 	struct platform_device	*plat_device;
 	struct pmu_hw_events	__percpu *hw_events;
+	struct notifier_block	hotplug_nb;
 };
 
 #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index f8a237d..3095ca1 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -160,8 +160,31 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
 	return 0;
 }
 
+/*
+ * PMU hardware loses all context when a CPU goes offline.
+ * When a CPU is hotplugged back in, since some hardware registers are
+ * UNKNOWN at reset, the PMU must be explicitly reset to avoid reading
+ * junk values out of them.
+ */
+static int cpu_pmu_notify(struct notifier_block *b, unsigned long action,
+			  void *hcpu)
+{
+	struct arm_pmu *pmu = container_of(b, struct arm_pmu, hotplug_nb);
+
+	if ((action & ~CPU_TASKS_FROZEN) != CPU_STARTING)
+		return NOTIFY_DONE;
+
+	if (pmu->reset)
+		pmu->reset(pmu);
+	else
+		return NOTIFY_DONE;
+
+	return NOTIFY_OK;
+}
+
 static int cpu_pmu_init(struct arm_pmu *cpu_pmu)
 {
+	int err;
 	int cpu;
 	struct pmu_hw_events __percpu *cpu_hw_events;
 
@@ -169,6 +192,11 @@ static int cpu_pmu_init(struct arm_pmu *cpu_pmu)
 	if (!cpu_hw_events)
 		return -ENOMEM;
 
+	cpu_pmu->hotplug_nb.notifier_call = cpu_pmu_notify;
+	err = register_cpu_notifier(&cpu_pmu->hotplug_nb);
+	if (err)
+		goto out_hw_events;
+
 	for_each_possible_cpu(cpu) {
 		struct pmu_hw_events *events = per_cpu_ptr(cpu_hw_events, cpu);
 		raw_spin_lock_init(&events->pmu_lock);
@@ -188,32 +216,12 @@ static int cpu_pmu_init(struct arm_pmu *cpu_pmu)
 		cpu_pmu->pmu.capabilities |= PERF_PMU_CAP_NO_INTERRUPT;
 
 	return 0;
-}
 
-/*
- * PMU hardware loses all context when a CPU goes offline.
- * When a CPU is hotplugged back in, since some hardware registers are
- * UNKNOWN at reset, the PMU must be explicitly reset to avoid reading
- * junk values out of them.
- */
-static int cpu_pmu_notify(struct notifier_block *b, unsigned long action,
-			  void *hcpu)
-{
-	if ((action & ~CPU_TASKS_FROZEN) != CPU_STARTING)
-		return NOTIFY_DONE;
-
-	if (cpu_pmu && cpu_pmu->reset)
-		cpu_pmu->reset(cpu_pmu);
-	else
-		return NOTIFY_DONE;
-
-	return NOTIFY_OK;
+out_hw_events:
+	free_percpu(cpu_hw_events);
+	return err;
 }
 
-static struct notifier_block cpu_pmu_hotplug_notifier = {
-	.notifier_call = cpu_pmu_notify,
-};
-
 /*
  * PMU platform driver and devicetree bindings.
  */
@@ -336,16 +344,6 @@ static struct platform_driver cpu_pmu_driver = {
 
 static int __init register_pmu_driver(void)
 {
-	int err;
-
-	err = register_cpu_notifier(&cpu_pmu_hotplug_notifier);
-	if (err)
-		return err;
-
-	err = platform_driver_register(&cpu_pmu_driver);
-	if (err)
-		unregister_cpu_notifier(&cpu_pmu_hotplug_notifier);
-
-	return err;
+	return platform_driver_register(&cpu_pmu_driver);
 }
 device_initcall(register_pmu_driver);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/8] arm: perf: dynamically allocate cpu hardware data
  2014-10-21 13:11 ` [PATCH 7/8] arm: perf: dynamically allocate cpu hardware data Mark Rutland
@ 2014-10-21 21:24   ` Stephen Boyd
  2014-10-22 11:06     ` Mark Rutland
  0 siblings, 1 reply; 19+ messages in thread
From: Stephen Boyd @ 2014-10-21 21:24 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/21/2014 06:11 AM, Mark Rutland wrote:
> @@ -162,16 +160,22 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
>  	return 0;
>  }
>  
> -static void cpu_pmu_init(struct arm_pmu *cpu_pmu)
> +static int cpu_pmu_init(struct arm_pmu *cpu_pmu)
>  {
>  	int cpu;
> +	struct pmu_hw_events __percpu *cpu_hw_events;
> +
> +	cpu_hw_events = alloc_percpu(struct pmu_hw_events);

Shouldn't we free this somewhere?

> +	if (!cpu_hw_events)
> +		return -ENOMEM;
> +
>  	for_each_possible_cpu(cpu) {
> -		struct pmu_hw_events *events = &per_cpu(cpu_hw_events, cpu);
> +		struct pmu_hw_events *events = per_cpu_ptr(cpu_hw_events, cpu);
>  		raw_spin_lock_init(&events->pmu_lock);
>  		events->percpu_pmu = cpu_pmu;
>  	}
>  
> -	cpu_pmu->hw_events	= &cpu_hw_events;
> +	cpu_pmu->hw_events	= cpu_hw_events;
>  	cpu_pmu->request_irq	= cpu_pmu_request_irq;
>  	cpu_pmu->free_irq	= cpu_pmu_free_irq;
>  
> @@ -303,7 +309,10 @@ static int cpu_pmu_device_probe(struct platform_device *pdev)
>  		goto out_free;
>  	}
>  
> -	cpu_pmu_init(cpu_pmu);
> +	ret = cpu_pmu_init(cpu_pmu);
> +	if (ret)
> +		goto out_free;
> +
>  	ret = armpmu_register(cpu_pmu, -1);
>  
>  	if (!ret)

Especially if this fails?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/8] arm: perf: make PMU probing data-driven
  2014-10-21 13:11 ` [PATCH 2/8] arm: perf: make PMU probing data-driven Mark Rutland
@ 2014-10-21 21:25   ` Stephen Boyd
  2014-10-22  9:50     ` Mark Rutland
  0 siblings, 1 reply; 19+ messages in thread
From: Stephen Boyd @ 2014-10-21 21:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/21/2014 06:11 AM, Mark Rutland wrote:
> diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
> index 4bf4cce..79c1cf4 100644
> --- a/arch/arm/kernel/perf_event_cpu.c
> +++ b/arch/arm/kernel/perf_event_cpu.c
> @@ -241,48 +241,34 @@ static struct platform_device_id cpu_pmu_plat_device_ids[] = {
>  	{},
>  };
>  
> +static struct pmu_probe_info pmu_probe_table[] = {

const?

> +	ARM_PMU_PROBE(ARM_CPU_PART_ARM1136, armv6_1136_pmu_init),
> +	ARM_PMU_PROBE(ARM_CPU_PART_ARM1156, armv6_1156_pmu_init),
> +	ARM_PMU_PROBE(ARM_CPU_PART_ARM1176, armv6_1176_pmu_init),
> +	ARM_PMU_PROBE(ARM_CPU_PART_ARM11MPCORE, armv6mpcore_pmu_init),
> +	ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A8, armv7_a8_pmu_init),
> +	ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A9, armv7_a9_pmu_init),
> +	XSCALE_PMU_PROBE(ARM_CPU_XSCALE_ARCH_V1, xscale1pmu_init),
> +	XSCALE_PMU_PROBE(ARM_CPU_XSCALE_ARCH_V2, xscale2pmu_init),
> +	{ /* sentinel value */ }
> +};
> +
>  /*
>   * CPU PMU identification and probing.
>   */
>  static int probe_current_pmu(struct arm_pmu *pmu)
>  {
>  	int cpu = get_cpu();
> +	unsigned int cpuid = read_cpuid_id();
>  	int ret = -ENODEV;
> +	struct pmu_probe_info *info;
>  
> -	pr_info("probing PMU on CPU %d\n", cpu);
> +	pr_info("probing PMU on CPU %d", cpu);

Why drop the newline? Looks like all the other prints in this file are
missing the newline.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/8] arm: perf: use IDR types for CPU PMUs
  2014-10-21 13:11 ` [PATCH 3/8] arm: perf: use IDR types for CPU PMUs Mark Rutland
@ 2014-10-21 21:25   ` Stephen Boyd
  2014-10-22 10:06     ` Mark Rutland
  0 siblings, 1 reply; 19+ messages in thread
From: Stephen Boyd @ 2014-10-21 21:25 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/21/2014 06:11 AM, Mark Rutland wrote:
>  arch/arm/kernel/perf_event.c     | 6 +++++-
>  arch/arm/kernel/perf_event_cpu.c | 2 +-
>  2 files changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> index ae96b98..f0bbd3d 100644
> --- a/arch/arm/kernel/perf_event.c
> +++ b/arch/arm/kernel/perf_event.c
> @@ -77,8 +77,12 @@ armpmu_map_event(struct perf_event *event,
>  		 u32 raw_event_mask)
>  {
>  	u64 config = event->attr.config;
> +	int type = event->attr.type;

Can we use u32 here to match the userspace ABI and avoid any signed vs.
unsigned oddness?

>  
> -	switch (event->attr.type) {
> +	if (type >= PERF_TYPE_MAX && type == event->pmu->type)
> +		return armpmu_map_raw_event(raw_event_mask, config);
> +
> +	switch (type) {
>  	case PERF_TYPE_HARDWARE:
>  		return armpmu_map_hw_event(event_map, config);
>  	case PERF_TYPE_HW_CACHE:
>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/8] arm: perf: fold percpu_pmu into pmu_hw_events
  2014-10-21 13:11 ` [PATCH 6/8] arm: perf: fold percpu_pmu into pmu_hw_events Mark Rutland
@ 2014-10-21 22:05   ` Stephen Boyd
  2014-10-22 10:10     ` Mark Rutland
  0 siblings, 1 reply; 19+ messages in thread
From: Stephen Boyd @ 2014-10-21 22:05 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/21/2014 06:11 AM, Mark Rutland wrote:
> Currently the percpu_pmu pointers used as percpu_irq dev_id values are
> defined separately from the other per-cpu accounting data, which make
> dynamically allocating the data (as will be required for systems with
> heterogeneous CPUs) difficult.
>
> This patch moves the percpu_pmu pointers into pmu_hw_events (which is
> itself allocated per cpu), which will allow for easier dynamic
> allocation. Both percpu and regular irqs are requested using percpu_pmu
> pointers as tokens, freeing us from having to know whether an irq is
> percpu within the handler, and thus avoiding a radix tree lookup on the
> handler path.
>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Reviewed-by: Will Deacon <will.deacon@arm.com>
> Cc: Stephen Boyd <sboyd@codeaurora.org>

Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>

Works on Krait with percpu interrupts.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 8/8] arm: perf: fold hotplug notifier into arm_pmu
  2014-10-21 13:11 ` [PATCH 8/8] arm: perf: fold hotplug notifier into arm_pmu Mark Rutland
@ 2014-10-21 22:18   ` Stephen Boyd
  0 siblings, 0 replies; 19+ messages in thread
From: Stephen Boyd @ 2014-10-21 22:18 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/21/2014 06:11 AM, Mark Rutland wrote:
> @@ -169,6 +192,11 @@ static int cpu_pmu_init(struct arm_pmu *cpu_pmu)
>  	if (!cpu_hw_events)
>  		return -ENOMEM;
>  
> +	cpu_pmu->hotplug_nb.notifier_call = cpu_pmu_notify;
> +	err = register_cpu_notifier(&cpu_pmu->hotplug_nb);

Does this need to be unregistered if armpmu_register() fails?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 2/8] arm: perf: make PMU probing data-driven
  2014-10-21 21:25   ` Stephen Boyd
@ 2014-10-22  9:50     ` Mark Rutland
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Rutland @ 2014-10-22  9:50 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 21, 2014 at 10:25:02PM +0100, Stephen Boyd wrote:
> On 10/21/2014 06:11 AM, Mark Rutland wrote:
> > diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
> > index 4bf4cce..79c1cf4 100644
> > --- a/arch/arm/kernel/perf_event_cpu.c
> > +++ b/arch/arm/kernel/perf_event_cpu.c
> > @@ -241,48 +241,34 @@ static struct platform_device_id cpu_pmu_plat_device_ids[] = {
> >  	{},
> >  };
> >  
> > +static struct pmu_probe_info pmu_probe_table[] = {
> 
> const?

Sure. Done.

> > +	ARM_PMU_PROBE(ARM_CPU_PART_ARM1136, armv6_1136_pmu_init),
> > +	ARM_PMU_PROBE(ARM_CPU_PART_ARM1156, armv6_1156_pmu_init),
> > +	ARM_PMU_PROBE(ARM_CPU_PART_ARM1176, armv6_1176_pmu_init),
> > +	ARM_PMU_PROBE(ARM_CPU_PART_ARM11MPCORE, armv6mpcore_pmu_init),
> > +	ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A8, armv7_a8_pmu_init),
> > +	ARM_PMU_PROBE(ARM_CPU_PART_CORTEX_A9, armv7_a9_pmu_init),
> > +	XSCALE_PMU_PROBE(ARM_CPU_XSCALE_ARCH_V1, xscale1pmu_init),
> > +	XSCALE_PMU_PROBE(ARM_CPU_XSCALE_ARCH_V2, xscale2pmu_init),
> > +	{ /* sentinel value */ }
> > +};
> > +
> >  /*
> >   * CPU PMU identification and probing.
> >   */
> >  static int probe_current_pmu(struct arm_pmu *pmu)
> >  {
> >  	int cpu = get_cpu();
> > +	unsigned int cpuid = read_cpuid_id();
> >  	int ret = -ENODEV;
> > +	struct pmu_probe_info *info;
> >  
> > -	pr_info("probing PMU on CPU %d\n", cpu);
> > +	pr_info("probing PMU on CPU %d", cpu);
> 
> Why drop the newline?

Unintentional -- at one point I'd changed the print while debugging and
must have messed up trying to restore it to its original state. I've
restored the newline locally.

> Looks like all the other prints in this file are missing the newline.

Hmm. All other pr_info instances are missing newlines, but pr_err and
pr_warning instances have them. It looks like we get lucky with the next
print having a log prefix, and this causing the existing text to be
flushed.

I'll put together a cleanup patch introducing the relevant newlines.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/8] arm: perf: use IDR types for CPU PMUs
  2014-10-21 21:25   ` Stephen Boyd
@ 2014-10-22 10:06     ` Mark Rutland
  2014-10-27 20:29       ` Stephen Boyd
  0 siblings, 1 reply; 19+ messages in thread
From: Mark Rutland @ 2014-10-22 10:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 21, 2014 at 10:25:18PM +0100, Stephen Boyd wrote:
> On 10/21/2014 06:11 AM, Mark Rutland wrote:
> >  arch/arm/kernel/perf_event.c     | 6 +++++-
> >  arch/arm/kernel/perf_event_cpu.c | 2 +-
> >  2 files changed, 6 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
> > index ae96b98..f0bbd3d 100644
> > --- a/arch/arm/kernel/perf_event.c
> > +++ b/arch/arm/kernel/perf_event.c
> > @@ -77,8 +77,12 @@ armpmu_map_event(struct perf_event *event,
> >  		 u32 raw_event_mask)
> >  {
> >  	u64 config = event->attr.config;
> > +	int type = event->attr.type;
> 
> Can we use u32 here to match the userspace ABI and avoid any signed vs.
> unsigned oddness?

I'd used int to match the definition of struct pmu::type (and elsewhere
in the perf core code, e.g. the int type parameter to
perf_pmu_register).

> >  
> > -	switch (event->attr.type) {
> > +	if (type >= PERF_TYPE_MAX && type == event->pmu->type)

I'll get rid of the check against PERF_TYPE_MAX here -- it's redundant
given we're about to check equivalence with event->pmu->type anyway.

With that removed we only check for equivalence between the userspace
provided type and any kernelspace type fields (which should all be in
the range [0,INT_MAX]), rather than greater/less than comparisons. So I
think we should be ok.

Does that make sense?

Thanks,
Mark.

> > +		return armpmu_map_raw_event(raw_event_mask, config);
> > +
> > +	switch (type) {
> >  	case PERF_TYPE_HARDWARE:
> >  		return armpmu_map_hw_event(event_map, config);
> >  	case PERF_TYPE_HW_CACHE:
> >
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 6/8] arm: perf: fold percpu_pmu into pmu_hw_events
  2014-10-21 22:05   ` Stephen Boyd
@ 2014-10-22 10:10     ` Mark Rutland
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Rutland @ 2014-10-22 10:10 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 21, 2014 at 11:05:07PM +0100, Stephen Boyd wrote:
> On 10/21/2014 06:11 AM, Mark Rutland wrote:
> > Currently the percpu_pmu pointers used as percpu_irq dev_id values are
> > defined separately from the other per-cpu accounting data, which make
> > dynamically allocating the data (as will be required for systems with
> > heterogeneous CPUs) difficult.
> >
> > This patch moves the percpu_pmu pointers into pmu_hw_events (which is
> > itself allocated per cpu), which will allow for easier dynamic
> > allocation. Both percpu and regular irqs are requested using percpu_pmu
> > pointers as tokens, freeing us from having to know whether an irq is
> > percpu within the handler, and thus avoiding a radix tree lookup on the
> > handler path.
> >
> > Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> > Reviewed-by: Will Deacon <will.deacon@arm.com>
> > Cc: Stephen Boyd <sboyd@codeaurora.org>
> 
> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
> Tested-by: Stephen Boyd <sboyd@codeaurora.org>
> 
> Works on Krait with percpu interrupts.

Thanks for testing!

Mark.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 7/8] arm: perf: dynamically allocate cpu hardware data
  2014-10-21 21:24   ` Stephen Boyd
@ 2014-10-22 11:06     ` Mark Rutland
  0 siblings, 0 replies; 19+ messages in thread
From: Mark Rutland @ 2014-10-22 11:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Oct 21, 2014 at 10:24:54PM +0100, Stephen Boyd wrote:
> On 10/21/2014 06:11 AM, Mark Rutland wrote:
> > @@ -162,16 +160,22 @@ static int cpu_pmu_request_irq(struct arm_pmu *cpu_pmu, irq_handler_t handler)
> >  	return 0;
> >  }
> >  
> > -static void cpu_pmu_init(struct arm_pmu *cpu_pmu)
> > +static int cpu_pmu_init(struct arm_pmu *cpu_pmu)
> >  {
> >  	int cpu;
> > +	struct pmu_hw_events __percpu *cpu_hw_events;
> > +
> > +	cpu_hw_events = alloc_percpu(struct pmu_hw_events);
> 
> Shouldn't we free this somewhere?

My bad. We should clean this up if registration fails. As we don't
support unregistering PMUs, that should be the only case where we'll
need to free this.

I'll wrap that up in a new cpu_pmu_destroy function, which we can also
use to tear down the notifier introduced in the next patch.

Thanks,
Mark.

> 
> > +	if (!cpu_hw_events)
> > +		return -ENOMEM;
> > +
> >  	for_each_possible_cpu(cpu) {
> > -		struct pmu_hw_events *events = &per_cpu(cpu_hw_events, cpu);
> > +		struct pmu_hw_events *events = per_cpu_ptr(cpu_hw_events, cpu);
> >  		raw_spin_lock_init(&events->pmu_lock);
> >  		events->percpu_pmu = cpu_pmu;
> >  	}
> >  
> > -	cpu_pmu->hw_events	= &cpu_hw_events;
> > +	cpu_pmu->hw_events	= cpu_hw_events;
> >  	cpu_pmu->request_irq	= cpu_pmu_request_irq;
> >  	cpu_pmu->free_irq	= cpu_pmu_free_irq;
> >  
> > @@ -303,7 +309,10 @@ static int cpu_pmu_device_probe(struct platform_device *pdev)
> >  		goto out_free;
> >  	}
> >  
> > -	cpu_pmu_init(cpu_pmu);
> > +	ret = cpu_pmu_init(cpu_pmu);
> > +	if (ret)
> > +		goto out_free;
> > +
> >  	ret = armpmu_register(cpu_pmu, -1);
> >  
> >  	if (!ret)
> 
> Especially if this fails?
> 
> -- 
> Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
> a Linux Foundation Collaborative Project
> 
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 3/8] arm: perf: use IDR types for CPU PMUs
  2014-10-22 10:06     ` Mark Rutland
@ 2014-10-27 20:29       ` Stephen Boyd
  0 siblings, 0 replies; 19+ messages in thread
From: Stephen Boyd @ 2014-10-27 20:29 UTC (permalink / raw)
  To: linux-arm-kernel

On 10/22/2014 03:06 AM, Mark Rutland wrote:
> On Tue, Oct 21, 2014 at 10:25:18PM +0100, Stephen Boyd wrote:
>> On 10/21/2014 06:11 AM, Mark Rutland wrote:
>>>  arch/arm/kernel/perf_event.c     | 6 +++++-
>>>  arch/arm/kernel/perf_event_cpu.c | 2 +-
>>>  2 files changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/arm/kernel/perf_event.c b/arch/arm/kernel/perf_event.c
>>> index ae96b98..f0bbd3d 100644
>>> --- a/arch/arm/kernel/perf_event.c
>>> +++ b/arch/arm/kernel/perf_event.c
>>> @@ -77,8 +77,12 @@ armpmu_map_event(struct perf_event *event,
>>>  		 u32 raw_event_mask)
>>>  {
>>>  	u64 config = event->attr.config;
>>> +	int type = event->attr.type;
>> Can we use u32 here to match the userspace ABI and avoid any signed vs.
>> unsigned oddness?
> I'd used int to match the definition of struct pmu::type (and elsewhere
> in the perf core code, e.g. the int type parameter to
> perf_pmu_register).
>
>>>  
>>> -	switch (event->attr.type) {
>>> +	if (type >= PERF_TYPE_MAX && type == event->pmu->type)
> I'll get rid of the check against PERF_TYPE_MAX here -- it's redundant
> given we're about to check equivalence with event->pmu->type anyway.
>
> With that removed we only check for equivalence between the userspace
> provided type and any kernelspace type fields (which should all be in
> the range [0,INT_MAX]), rather than greater/less than comparisons. So I
> think we should be ok.
>
> Does that make sense?

Ok. I was mostly worried about this greater than comparison so if that
goes away I think we're good.

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2014-10-27 20:29 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-10-21 13:11 [PATCH 0/8] Prepatory rework for multi-PMU support Mark Rutland
2014-10-21 13:11 ` [PATCH 1/8] arm: perf: factor out callchain code Mark Rutland
2014-10-21 13:11 ` [PATCH 2/8] arm: perf: make PMU probing data-driven Mark Rutland
2014-10-21 21:25   ` Stephen Boyd
2014-10-22  9:50     ` Mark Rutland
2014-10-21 13:11 ` [PATCH 3/8] arm: perf: use IDR types for CPU PMUs Mark Rutland
2014-10-21 21:25   ` Stephen Boyd
2014-10-22 10:06     ` Mark Rutland
2014-10-27 20:29       ` Stephen Boyd
2014-10-21 13:11 ` [PATCH 4/8] arm: perf: limit size of accounting data Mark Rutland
2014-10-21 13:11 ` [PATCH 5/8] arm: perf: kill get_hw_events() Mark Rutland
2014-10-21 13:11 ` [PATCH 6/8] arm: perf: fold percpu_pmu into pmu_hw_events Mark Rutland
2014-10-21 22:05   ` Stephen Boyd
2014-10-22 10:10     ` Mark Rutland
2014-10-21 13:11 ` [PATCH 7/8] arm: perf: dynamically allocate cpu hardware data Mark Rutland
2014-10-21 21:24   ` Stephen Boyd
2014-10-22 11:06     ` Mark Rutland
2014-10-21 13:11 ` [PATCH 8/8] arm: perf: fold hotplug notifier into arm_pmu Mark Rutland
2014-10-21 22:18   ` Stephen Boyd

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).