[PATCH v2 0/6] perf: Support for ARM DynamIQ Shared Unit PMU

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH v2 0/6] perf: Support for ARM DynamIQ Shared Unit PMU
@ 2017-07-24 10:29 Suzuki K Poulose
  2017-07-24 10:29 ` [PATCH v2 1/6] perf: Export perf_event_update_userpage Suzuki K Poulose
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2017-07-24 10:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, peterz, Suzuki K Poulose

This series adds support for the PMU in ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, using 32bit
independent counters along with providing a 64bit cycle counter.

The PMU can only be accessed via CPU system registers, which are common
to the cores connected to the same DSU. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

Tested on a Fast model with DSU. The driver only supports ARM64 at the
moment. It can be extended to support ARM32 by providing register
accessors like we do in arch/arm64/include/arm_dsu_pmu.h.

The firmware should setup appropriate bits in the ACTLR_EL3/EL2 to
allow EL1 access to the PMU registers.

Series applies on v4.13-rc2 and is also available at:

  git://linux-arm.org/linux-skp.git 4.13/dsu-v2

Changes since V1:
 - Use the new of_device_node_get_cpu() helper for Coresight
 - Rebased to 4.13-rc2

Suzuki K Poulose (6):
  perf: Export perf_event_update_userpage
  of: Add helper for mapping device node to logical CPU number
  coresight: of: Use of_device_node_get_cpu helper
  irqchip: gic-v3: Use of_device_node_get_cpu helper
  dt-bindings: Document devicetree binding for ARM DSU PMU
  perf: ARM DynamIQ Shared Unit PMU support

 .../devicetree/bindings/arm/arm-dsu-pmu.txt        |  28 +
 arch/arm64/include/asm/arm_dsu_pmu.h               | 124 +++
 drivers/hwtracing/coresight/of_coresight.c         |  20 +-
 drivers/irqchip/irq-gic-v3.c                       |  30 +-
 drivers/of/base.c                                  |  26 +
 drivers/perf/Kconfig                               |   9 +
 drivers/perf/Makefile                              |   1 +
 drivers/perf/arm_dsu_pmu.c                         | 877 +++++++++++++++++++++
 include/linux/of_device.h                          |   7 +
 kernel/events/core.c                               |   1 +
 10 files changed, 1082 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

-- 
2.7.5

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v2 1/6] perf: Export perf_event_update_userpage
  2017-07-24 10:29 [PATCH v2 0/6] perf: Support for ARM DynamIQ Shared Unit PMU Suzuki K Poulose
@ 2017-07-24 10:29 ` Suzuki K Poulose
  2017-07-24 10:29 ` [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number Suzuki K Poulose
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2017-07-24 10:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, peterz, Suzuki K Poulose

Export perf_event_update_userpage() so that PMU driver using them,
can be built as modules.

Cc: Peter Zilstra <peterz@infradead.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 kernel/events/core.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 426c2ff..21aad7a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4946,6 +4946,7 @@ void perf_event_update_userpage(struct perf_event *event)
 unlock:
 	rcu_read_unlock();
 }
+EXPORT_SYMBOL_GPL(perf_event_update_userpage);
 
 static int perf_mmap_fault(struct vm_fault *vmf)
 {
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number
  2017-07-24 10:29 [PATCH v2 0/6] perf: Support for ARM DynamIQ Shared Unit PMU Suzuki K Poulose
  2017-07-24 10:29 ` [PATCH v2 1/6] perf: Export perf_event_update_userpage Suzuki K Poulose
@ 2017-07-24 10:29 ` Suzuki K Poulose
  2017-07-24 13:14   ` Marc Zyngier
  2017-07-24 16:42   ` Sudeep Holla
  2017-07-24 10:29 ` [PATCH v2 3/6] coresight: of: Use of_device_node_get_cpu helper Suzuki K Poulose
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2017-07-24 10:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, peterz, Suzuki K Poulose,
	Rob Herring

Add a helper to map a device node to a logical CPU number to avoid
duplication. Currently this is open coded in different places (e.g
gic-v3, coresight). The helper tries to map device node to a "possible"
logical CPU id, which may not be online yet. It is the responsibility
of the user to make sure that the CPU is online. The helper uses
of_get_cpu_node() which uses arch specific backends to match the phyiscal
ids.

Cc: Rob Herring <robh+dt@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/of/base.c         | 26 ++++++++++++++++++++++++++
 include/linux/of_device.h |  7 +++++++
 2 files changed, 33 insertions(+)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 686628d..639af23 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -420,6 +420,32 @@ struct device_node *of_get_cpu_node(int cpu, unsigned int *thread)
 EXPORT_SYMBOL(of_get_cpu_node);
 
 /**
+ * of_device_node_get_cpu: Get the logical CPU number for a given device_node
+ *
+ * @cpu_node: Pointer to the device_node for CPU.
+ *
+ * Returns the logical CPU number of the given CPU device_node.
+ * Returns >= nr_cpu_ids if CPU is not found.
+ */
+int of_device_node_get_cpu(struct device_node *cpu_node)
+{
+	int cpu, thread;
+	bool found = false;
+	struct device_node *np;
+
+	for_each_possible_cpu(cpu) {
+		np = of_get_cpu_node(cpu, &thread);
+		found = (cpu_node == np);
+		of_node_put(np);
+		if (found)
+			break;
+	}
+
+	return cpu;
+}
+EXPORT_SYMBOL(of_device_node_get_cpu);
+
+/**
  * __of_device_is_compatible() - Check if the node matches given constraints
  * @device: pointer to node
  * @compat: required compatible string, NULL or "" for any match
diff --git a/include/linux/of_device.h b/include/linux/of_device.h
index b4ad8b4..00a4ba9 100644
--- a/include/linux/of_device.h
+++ b/include/linux/of_device.h
@@ -40,6 +40,8 @@ extern int of_device_request_module(struct device *dev);
 extern void of_device_uevent(struct device *dev, struct kobj_uevent_env *env);
 extern int of_device_uevent_modalias(struct device *dev, struct kobj_uevent_env *env);
 
+extern int of_device_node_get_cpu(struct device_node *np);
+
 static inline void of_device_node_put(struct device *dev)
 {
 	of_node_put(dev->of_node);
@@ -110,6 +112,11 @@ static inline int of_dma_configure(struct device *dev, struct device_node *np)
 }
 static inline void of_dma_deconfigure(struct device *dev)
 {}
+
+static inline int of_device_node_get_cpu(struct device_node *np)
+{
+	return nr_cpu_ids;
+}
 #endif /* CONFIG_OF */
 
 #endif /* _LINUX_OF_DEVICE_H */
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 3/6] coresight: of: Use of_device_node_get_cpu helper
  2017-07-24 10:29 [PATCH v2 0/6] perf: Support for ARM DynamIQ Shared Unit PMU Suzuki K Poulose
  2017-07-24 10:29 ` [PATCH v2 1/6] perf: Export perf_event_update_userpage Suzuki K Poulose
  2017-07-24 10:29 ` [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number Suzuki K Poulose
@ 2017-07-24 10:29 ` Suzuki K Poulose
  2017-07-25 16:14   ` Mathieu Poirier
  2017-07-24 10:29 ` [PATCH v2 4/6] irqchip: gic-v3: " Suzuki K Poulose
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 17+ messages in thread
From: Suzuki K Poulose @ 2017-07-24 10:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, peterz, Suzuki K Poulose,
	Mathieu Poirier, Leo Yan

Reuse the new generic helper, of_device_node_get_cpu() to map a
given CPU phandle to a logical CPU number.

Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Leo Yan <leo.yan@linaro.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/hwtracing/coresight/of_coresight.c | 20 ++++++--------------
 1 file changed, 6 insertions(+), 14 deletions(-)

diff --git a/drivers/hwtracing/coresight/of_coresight.c b/drivers/hwtracing/coresight/of_coresight.c
index a187941..42ce9f8 100644
--- a/drivers/hwtracing/coresight/of_coresight.c
+++ b/drivers/hwtracing/coresight/of_coresight.c
@@ -16,6 +16,7 @@
 #include <linux/clk.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
+#include <linux/of_device.h>
 #include <linux/of_graph.h>
 #include <linux/of_platform.h>
 #include <linux/platform_device.h>
@@ -104,26 +105,17 @@ static int of_coresight_alloc_memory(struct device *dev,
 int of_coresight_get_cpu(const struct device_node *node)
 {
 	int cpu;
-	bool found;
-	struct device_node *dn, *np;
+	struct device_node *dn;
 
 	dn = of_parse_phandle(node, "cpu", 0);
-
 	/* Affinity defaults to CPU0 */
 	if (!dn)
 		return 0;
-
-	for_each_possible_cpu(cpu) {
-		np = of_cpu_device_node_get(cpu);
-		found = (dn == np);
-		of_node_put(np);
-		if (found)
-			break;
-	}
-	of_node_put(dn);
-
+	cpu = of_device_node_get_cpu(dn);
 	/* Affinity to CPU0 if no cpu nodes are found */
-	return found ? cpu : 0;
+	if (cpu >= nr_cpu_ids)
+		return 0;
+	return cpu;
 }
 EXPORT_SYMBOL_GPL(of_coresight_get_cpu);
 
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 4/6] irqchip: gic-v3: Use of_device_node_get_cpu helper
  2017-07-24 10:29 [PATCH v2 0/6] perf: Support for ARM DynamIQ Shared Unit PMU Suzuki K Poulose
                   ` (2 preceding siblings ...)
  2017-07-24 10:29 ` [PATCH v2 3/6] coresight: of: Use of_device_node_get_cpu helper Suzuki K Poulose
@ 2017-07-24 10:29 ` Suzuki K Poulose
  2017-07-24 13:15   ` Marc Zyngier
  2017-07-24 10:29 ` [PATCH v2 5/6] dt-bindings: Document devicetree binding for ARM DSU PMU Suzuki K Poulose
  2017-07-24 10:29 ` [PATCH v2 6/6] perf: ARM DynamIQ Shared Unit PMU support Suzuki K Poulose
  5 siblings, 1 reply; 17+ messages in thread
From: Suzuki K Poulose @ 2017-07-24 10:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, peterz, Suzuki K Poulose,
	Marc Zyngier

Use the new generic helper of_device_node_get_cpu() instead
of using our own version to map a device node to logical CPU
number.

Cc: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 drivers/irqchip/irq-gic-v3.c | 30 +++---------------------------
 1 file changed, 3 insertions(+), 27 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
index dbffb7a..b7ce12b 100644
--- a/drivers/irqchip/irq-gic-v3.c
+++ b/drivers/irqchip/irq-gic-v3.c
@@ -25,6 +25,7 @@
 #include <linux/irqdomain.h>
 #include <linux/of.h>
 #include <linux/of_address.h>
+#include <linux/of_device.h>
 #include <linux/of_irq.h>
 #include <linux/percpu.h>
 #include <linux/slab.h>
@@ -978,31 +979,6 @@ static int __init gic_validate_dist_version(void __iomem *dist_base)
 	return 0;
 }
 
-static int get_cpu_number(struct device_node *dn)
-{
-	const __be32 *cell;
-	u64 hwid;
-	int i;
-
-	cell = of_get_property(dn, "reg", NULL);
-	if (!cell)
-		return -1;
-
-	hwid = of_read_number(cell, of_n_addr_cells(dn));
-
-	/*
-	 * Non affinity bits must be set to 0 in the DT
-	 */
-	if (hwid & ~MPIDR_HWID_BITMASK)
-		return -1;
-
-	for (i = 0; i < num_possible_cpus(); i++)
-		if (cpu_logical_map(i) == hwid)
-			return i;
-
-	return -1;
-}
-
 /* Create all possible partitions at boot time */
 static void __init gic_populate_ppi_partitions(struct device_node *gic_node)
 {
@@ -1053,8 +1029,8 @@ static void __init gic_populate_ppi_partitions(struct device_node *gic_node)
 			if (WARN_ON(!cpu_node))
 				continue;
 
-			cpu = get_cpu_number(cpu_node);
-			if (WARN_ON(cpu == -1))
+			cpu = of_device_node_get_cpu(cpu_node);
+			if (WARN_ON(cpu >= nr_cpu_ids))
 				continue;
 
 			pr_cont("%s[%d] ", cpu_node->full_name, cpu);
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 5/6] dt-bindings: Document devicetree binding for ARM DSU PMU
  2017-07-24 10:29 [PATCH v2 0/6] perf: Support for ARM DynamIQ Shared Unit PMU Suzuki K Poulose
                   ` (3 preceding siblings ...)
  2017-07-24 10:29 ` [PATCH v2 4/6] irqchip: gic-v3: " Suzuki K Poulose
@ 2017-07-24 10:29 ` Suzuki K Poulose
  2017-07-24 10:29 ` [PATCH v2 6/6] perf: ARM DynamIQ Shared Unit PMU support Suzuki K Poulose
  5 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2017-07-24 10:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, peterz, Suzuki K Poulose,
	Rob Herring

This patch documents the devicetree bindings for ARM DSU PMU.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Rob Herring <robh@kernel.org>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 .../devicetree/bindings/arm/arm-dsu-pmu.txt        | 28 ++++++++++++++++++++++
 1 file changed, 28 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt

diff --git a/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
new file mode 100644
index 0000000..1395b4a
--- /dev/null
+++ b/Documentation/devicetree/bindings/arm/arm-dsu-pmu.txt
@@ -0,0 +1,28 @@
+* ARM DynamIQ Shared Unit (DSU) Performance Monitor Unit (PMU)
+
+ARM DyanmIQ Shared Unit (DSU) integrates one or more CPU cores
+with a shared L3 memory system, control logic and external interfaces to
+form a multicore cluster. The PMU enables to gather various statistics on
+the operations of the DSU. The PMU provides independent 32bit counters that
+can count any of the supported events, along with a 64bit cycle counter.
+The PMU is accessed via CPU system registers and has no MMIO component.
+
+** DSU PMU required properties:
+
+- compatible	: should be one of :
+
+		"arm,dsu-pmu"
+
+- interrupts	: Exactly 1 SPI must be listed.
+
+- cpus		: List of phandles for the CPUs connected to this DSU instance.
+
+
+** Example:
+
+dsu_pmu@0 {
+	compatible = "arm,dsu-pmu";
+	interrupts = <GIC_SPI 02 IRQ_TYPE_LEVEL_HIGH>;
+	cpus = <&cpu_0>, <&cpu_1>;
+};
+
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH v2 6/6] perf: ARM DynamIQ Shared Unit PMU support
  2017-07-24 10:29 [PATCH v2 0/6] perf: Support for ARM DynamIQ Shared Unit PMU Suzuki K Poulose
                   ` (4 preceding siblings ...)
  2017-07-24 10:29 ` [PATCH v2 5/6] dt-bindings: Document devicetree binding for ARM DSU PMU Suzuki K Poulose
@ 2017-07-24 10:29 ` Suzuki K Poulose
  2017-07-24 14:50   ` Jonathan Cameron
  5 siblings, 1 reply; 17+ messages in thread
From: Suzuki K Poulose @ 2017-07-24 10:29 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, peterz, Suzuki K Poulose

Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
The DSU integrates one or more cores with an L3 memory system, control
logic, and external interfaces to form a multicore cluster. The PMU
allows counting the various events related to L3, SCU etc, along with
providing a cycle counter.

The PMU can be accessed via system registers, which are common
to the cores in the same cluster. The PMU registers follow the
semantics of the ARMv8 PMU, mostly, with the exception that
the counters record the cluster wide events.

This driver is mostly based on the ARMv8 and CCI PMU drivers.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
---
 arch/arm64/include/asm/arm_dsu_pmu.h | 124 +++++
 drivers/perf/Kconfig                 |   9 +
 drivers/perf/Makefile                |   1 +
 drivers/perf/arm_dsu_pmu.c           | 877 +++++++++++++++++++++++++++++++++++
 4 files changed, 1011 insertions(+)
 create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
 create mode 100644 drivers/perf/arm_dsu_pmu.c

diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h b/arch/arm64/include/asm/arm_dsu_pmu.h
new file mode 100644
index 0000000..e7276fd
--- /dev/null
+++ b/arch/arm64/include/asm/arm_dsu_pmu.h
@@ -0,0 +1,124 @@
+/*
+ * ARM DynamIQ Shared Unit (DSU) PMU Low level register access routines.
+ *
+ * Copyright (C) ARM Limited, 2017.
+ *
+ * Author: Suzuki K Poulose <suzuki.poulose@arm.com>
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2, as published by the Free Software Foundation.
+ */
+
+#include <asm/sysreg.h>
+
+
+#define CLUSTERPMCR_EL1			sys_reg(3, 0, 15, 5, 0)
+#define CLUSTERPMCNTENSET_EL1		sys_reg(3, 0, 15, 5, 1)
+#define CLUSTERPMCNTENCLR_EL1		sys_reg(3, 0, 15, 5, 2)
+#define CLUSTERPMOVSSET_EL1		sys_reg(3, 0, 15, 5, 3)
+#define CLUSTERPMOVSCLR_EL1		sys_reg(3, 0, 15, 5, 4)
+#define CLUSTERPMSELR_EL1		sys_reg(3, 0, 15, 5, 5)
+#define CLUSTERPMINTENSET_EL1		sys_reg(3, 0, 15, 5, 6)
+#define CLUSTERPMINTENCLR_EL1		sys_reg(3, 0, 15, 5, 7)
+#define CLUSTERPMCCNTR_EL1		sys_reg(3, 0, 15, 6, 0)
+#define CLUSTERPMXEVTYPER_EL1		sys_reg(3, 0, 15, 6, 1)
+#define CLUSTERPMXEVCNTR_EL1		sys_reg(3, 0, 15, 6, 2)
+#define CLUSTERPMMDCR_EL1		sys_reg(3, 0, 15, 6, 3)
+#define CLUSTERPMCEID0_EL1		sys_reg(3, 0, 15, 6, 4)
+#define CLUSTERPMCEID1_EL1		sys_reg(3, 0, 15, 6, 5)
+
+static inline u32 __dsu_pmu_read_pmcr(void)
+{
+	return read_sysreg_s(CLUSTERPMCR_EL1);
+}
+
+static inline void __dsu_pmu_write_pmcr(u32 val)
+{
+	write_sysreg_s(val, CLUSTERPMCR_EL1);
+	isb();
+}
+
+static inline u32 __dsu_pmu_get_pmovsclr(void)
+{
+	u32 val = read_sysreg_s(CLUSTERPMOVSCLR_EL1);
+	/* Clear the bit */
+	write_sysreg_s(val, CLUSTERPMOVSCLR_EL1);
+	isb();
+	return val;
+}
+
+static inline void __dsu_pmu_select_counter(int counter)
+{
+	write_sysreg_s(counter, CLUSTERPMSELR_EL1);
+	isb();
+}
+
+static inline u64 __dsu_pmu_read_counter(int counter)
+{
+	__dsu_pmu_select_counter(counter);
+	return read_sysreg_s(CLUSTERPMXEVCNTR_EL1);
+}
+
+static inline void __dsu_pmu_write_counter(int counter, u64 val)
+{
+	__dsu_pmu_select_counter(counter);
+	write_sysreg_s(val, CLUSTERPMXEVCNTR_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_set_event(int counter, u32 event)
+{
+	__dsu_pmu_select_counter(counter);
+	write_sysreg_s(event, CLUSTERPMXEVTYPER_EL1);
+	isb();
+}
+
+static inline u64 __dsu_pmu_read_pmccntr(void)
+{
+	return read_sysreg_s(CLUSTERPMCCNTR_EL1);
+}
+
+static inline void __dsu_pmu_write_pmccntr(u64 val)
+{
+	write_sysreg_s(val, CLUSTERPMCCNTR_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_disable_counter(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMCNTENCLR_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_enable_counter(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMCNTENSET_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_counter_interrupt_enable(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMINTENSET_EL1);
+	isb();
+}
+
+static inline void __dsu_pmu_counter_interrupt_disable(int counter)
+{
+	write_sysreg_s(BIT(counter), CLUSTERPMINTENCLR_EL1);
+	isb();
+}
+
+
+static inline u32 __dsu_pmu_read_pmceid(int n)
+{
+	switch (n) {
+	case 0:
+		return read_sysreg_s(CLUSTERPMCEID0_EL1);
+	case 1:
+		return read_sysreg_s(CLUSTERPMCEID1_EL1);
+	default:
+		BUILD_BUG();
+		return 0;
+	}
+}
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index e5197ff..ee3d7d1 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -17,6 +17,15 @@ config ARM_PMU_ACPI
 	depends on ARM_PMU && ACPI
 	def_bool y
 
+config ARM_DSU_PMU
+	tristate "ARM DynamIQ Shared Unit (DSU) PMU"
+	depends on ARM64 && PERF_EVENTS
+	  help
+	  Provides support for performance monitor unit in ARM DynamIQ Shared
+	  Unit (DSU). The DSU integrates one or more cores  with an L3 memory
+	  system, control logic. The PMU allows counting various events related
+	  to DSU.
+
 config QCOM_L2_PMU
 	bool "Qualcomm Technologies L2-cache PMU"
 	depends on ARCH_QCOM && ARM64 && ACPI
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 6420bd4..0adb4f6 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -1,5 +1,6 @@
 obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
 obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
+obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
 obj-$(CONFIG_QCOM_L2_PMU)	+= qcom_l2_pmu.o
 obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
 obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
new file mode 100644
index 0000000..60b2c03
--- /dev/null
+++ b/drivers/perf/arm_dsu_pmu.c
@@ -0,0 +1,877 @@
+/*
+ * ARM DynamIQ Shared Unit (DSU) PMU driver
+ *
+ * Copyright (C) ARM Limited, 2017.
+ *
+ * Based on ARM CCI-PMU, ARMv8 PMU-v3 drivers.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ */
+
+#define PMUNAME		"arm_dsu"
+#define DRVNAME		PMUNAME "_pmu"
+#define pr_fmt(fmt)	DRVNAME ": " fmt
+
+#include <linux/device.h>
+#include <linux/interrupt.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/of_device.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/spinlock.h>
+
+#include <asm/arm_dsu_pmu.h>
+
+/* PMU event codes */
+#define DSU_PMU_EVT_CYCLES		0x11
+#define DSU_PMU_EVT_CHAIN		0x1e
+
+#define DSU_PMU_MAX_COMMON_EVENTS	0x40
+
+#define DSU_PMU_MAX_HW_CNTRS		32
+#define DSU_PMU_HW_COUNTER_MASK		(DSU_PMU_MAX_HW_CNTRS - 1)
+
+#define CLUSTERPMCR_E			BIT(0)
+#define CLUSTERPMCR_P			BIT(1)
+#define CLUSTERPMCR_C			BIT(2)
+#define CLUSTERPMCR_N_SHIFT		11
+#define CLUSTERPMCR_N_MASK		0x1f
+#define CLUSTERPMCR_IDCODE_SHIFT	16
+#define CLUSTERPMCR_IDCODE_MASK		0xff
+#define CLUSTERPMCR_IMP_SHIFT		24
+#define CLUSTERPMCR_IMP_MASK		0xff
+#define CLUSTERPMCR_RES_MASK		0x7e8
+#define CLUSTERPMCR_RES_VAL		0x40
+
+#define DSU_ACTIVE_CPU_MASK		0x0
+#define DSU_SUPPORTED_CPU_MASK		0x1
+
+/*
+ * We use the index of the counters as they appear in the counter
+ * bit maps in the PMU registers (e.g CLUSTERPMSELR).
+ * i.e,
+ *	counter 0	- Bit 0
+ *	counter 1	- Bit 1
+ *	...
+ *	Cycle counter	- Bit 31
+ */
+#define DSU_PMU_IDX_CYCLE_COUNTER	31
+
+/* All event counters are 32bit, with a 64bit Cycle counter */
+#define DSU_PMU_COUNTER_WIDTH(idx)	\
+	(((idx) == DSU_PMU_IDX_CYCLE_COUNTER) ? 64 : 32)
+
+#define DSU_PMU_COUNTER_MASK(idx)	\
+	GENMASK_ULL((DSU_PMU_COUNTER_WIDTH((idx)) - 1), 0)
+
+#define DSU_EXT_ATTR(_name, _func, _config)		\
+	(&((struct dev_ext_attribute[]) {				\
+		{							\
+			.attr = __ATTR(_name, 0444, _func, NULL),	\
+			.var = (void *)_config				\
+		}							\
+	})[0].attr.attr)
+
+#define DSU_EVENT_ATTR(_name, _config)		\
+	DSU_EXT_ATTR(_name, dsu_pmu_sysfs_event_show, (unsigned long)_config)
+
+#define DSU_FORMAT_ATTR(_name, _config)		\
+	DSU_EXT_ATTR(_name, dsu_pmu_sysfs_format_show, (char *)_config)
+
+#define DSU_CPUMASK_ATTR(_name, _config)	\
+	DSU_EXT_ATTR(_name, dsu_pmu_cpumask_show, (unsigned long)_config)
+
+struct dsu_hw_events {
+	DECLARE_BITMAP(used_mask, DSU_PMU_MAX_HW_CNTRS);
+	struct perf_event	*events[DSU_PMU_MAX_HW_CNTRS];
+};
+
+/*
+ * struct dsu_pmu	- DSU PMU descriptor
+ *
+ * @pmu_lock		: Protects accesses to DSU PMU register from multiple
+ *			  CPUs.
+ * @hw_events		: Holds the event counter state.
+ * @supported_cpus	: CPUs attached to the DSU.
+ * @active_cpu		: CPU to which the PMU is bound for accesses.
+ * @cphp_node		: Node for CPU hotplug notifier link.
+ * @num_counters	: Number of event counters implemented by the PMU,
+ *			  excluding the cycle counter.
+ * @irq			: Interrupt line for counter overflow.
+ * @cpmceid_bitmap	: Bitmap for the availability of architected common
+ *			  events ( event_code < 0x40).
+ */
+struct dsu_pmu {
+	struct pmu			pmu;
+	struct device			*dev;
+	raw_spinlock_t			pmu_lock;
+	struct dsu_hw_events		hw_events;
+	cpumask_t			supported_cpus;
+	cpumask_t			active_cpu;
+	struct hlist_node		cpuhp_node;
+	u8				num_counters;
+	int				irq;
+	DECLARE_BITMAP(cpmceid_bitmap, DSU_PMU_MAX_COMMON_EVENTS);
+};
+
+static unsigned long dsu_pmu_cpuhp_state;
+
+static inline struct dsu_pmu *to_dsu_pmu(struct pmu *pmu)
+{
+	return container_of(pmu, struct dsu_pmu, pmu);
+}
+
+static ssize_t dsu_pmu_sysfs_event_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr);
+	return snprintf(buf, PAGE_SIZE, "event=0x%lx\n",
+					 (unsigned long)eattr->var);
+}
+
+static ssize_t dsu_pmu_sysfs_format_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf)
+{
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr);
+	return snprintf(buf, PAGE_SIZE, "%s\n", (char *)eattr->var);
+}
+
+static ssize_t dsu_pmu_cpumask_show(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr);
+	unsigned long mask_id = (unsigned long)eattr->var;
+	const cpumask_t *cpumask;
+
+	switch (mask_id) {
+	case DSU_ACTIVE_CPU_MASK:
+		cpumask = &dsu_pmu->active_cpu;
+		break;
+	case DSU_SUPPORTED_CPU_MASK:
+		cpumask = &dsu_pmu->supported_cpus;
+		break;
+	default:
+		return 0;
+	}
+	return cpumap_print_to_pagebuf(true, buf, cpumask);
+}
+
+static struct attribute *dsu_pmu_format_attrs[] = {
+	DSU_FORMAT_ATTR(event, "config:0-31"),
+	NULL,
+};
+
+static const struct attribute_group dsu_pmu_format_attr_group = {
+	.name = "format",
+	.attrs = dsu_pmu_format_attrs,
+};
+
+static struct attribute *dsu_pmu_event_attrs[] = {
+	DSU_EVENT_ATTR(cycles, 0x11),
+	DSU_EVENT_ATTR(bus_acecss, 0x19),
+	DSU_EVENT_ATTR(memory_error, 0x1a),
+	DSU_EVENT_ATTR(bus_cycles, 0x1d),
+	DSU_EVENT_ATTR(l3d_cache_allocate, 0x29),
+	DSU_EVENT_ATTR(l3d_cache_refill, 0x2a),
+	DSU_EVENT_ATTR(l3d_cache, 0x2b),
+	DSU_EVENT_ATTR(l3d_cache_wb, 0x2c),
+	DSU_EVENT_ATTR(bus_access_rd, 0x60),
+	DSU_EVENT_ATTR(bus_access_wr, 0x61),
+	DSU_EVENT_ATTR(bus_access_shared, 0x62),
+	DSU_EVENT_ATTR(bus_access_not_shared, 0x63),
+	DSU_EVENT_ATTR(bus_access_normal, 0x64),
+	DSU_EVENT_ATTR(bus_access_periph, 0x65),
+	DSU_EVENT_ATTR(l3d_cache_rd, 0xa0),
+	DSU_EVENT_ATTR(l3d_cache_wr, 0xa1),
+	DSU_EVENT_ATTR(l3d_cache_refill_rd, 0xa2),
+	DSU_EVENT_ATTR(l3d_cache_refill_wr, 0xa3),
+	DSU_EVENT_ATTR(acp_access, 0x119),
+	DSU_EVENT_ATTR(acp_cycles, 0x11d),
+	DSU_EVENT_ATTR(acp_access_rd, 0x160),
+	DSU_EVENT_ATTR(acp_access_wr, 0x161),
+	DSU_EVENT_ATTR(pp_access, 0x219),
+	DSU_EVENT_ATTR(pp_cycles, 0x21d),
+	DSU_EVENT_ATTR(pp_access_rd, 0x260),
+	DSU_EVENT_ATTR(pp_access_wr, 0x261),
+	DSU_EVENT_ATTR(scu_snp_access, 0xc0),
+	DSU_EVENT_ATTR(scu_snp_evict, 0xc1),
+	DSU_EVENT_ATTR(scu_snp_access_cpu, 0xc2),
+	DSU_EVENT_ATTR(scu_pftch_cpu_access, 0x500),
+	DSU_EVENT_ATTR(scu_pftch_cpu_miss, 0x501),
+	DSU_EVENT_ATTR(scu_pftch_cpu_hit, 0x502),
+	DSU_EVENT_ATTR(scu_pftch_cpu_match, 0x503),
+	DSU_EVENT_ATTR(scu_pftch_cpu_kill, 0x504),
+	DSU_EVENT_ATTR(scu_stash_icn_access, 0x510),
+	DSU_EVENT_ATTR(scu_stash_icn_miss, 0x511),
+	DSU_EVENT_ATTR(scu_stash_icn_hit, 0x512),
+	DSU_EVENT_ATTR(scu_stash_icn_match, 0x513),
+	DSU_EVENT_ATTR(scu_stash_icn_kill, 0x514),
+	DSU_EVENT_ATTR(scu_hzd_address, 0xd0),
+	NULL,
+};
+
+static bool dsu_pmu_event_supported(struct dsu_pmu *dsu_pmu, unsigned long evt)
+{
+	/*
+	 * DSU PMU provides a bit map for events with
+	 *   id < DSU_PMU_MAX_COMMON_EVENTS.
+	 * Events above the range are reported as supported, as
+	 * tracking the support needs per-chip tables and makes
+	 * it difficult to track. If an event is not supported,
+	 * it won't be counted.
+	 */
+	if (evt >= DSU_PMU_MAX_COMMON_EVENTS)
+		return true;
+	/* The PMU driver doesn't support chain mode */
+	if (evt == DSU_PMU_EVT_CHAIN)
+		return false;
+	return test_bit(evt, dsu_pmu->cpmceid_bitmap);
+}
+
+static umode_t
+dsu_pmu_event_attr_is_visible(struct kobject *kobj, struct attribute *attr,
+				int unused)
+{
+	struct pmu *pmu = dev_get_drvdata(kobj_to_dev(kobj));
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+	struct dev_ext_attribute *eattr = container_of(attr,
+					struct dev_ext_attribute, attr.attr);
+	unsigned long evt = (unsigned long)eattr->var;
+
+	if (dsu_pmu_event_supported(dsu_pmu, evt))
+		return attr->mode;
+	return 0;
+}
+
+static const struct attribute_group dsu_pmu_events_attr_group = {
+	.name = "events",
+	.attrs = dsu_pmu_event_attrs,
+	.is_visible = dsu_pmu_event_attr_is_visible,
+};
+
+static struct attribute *dsu_pmu_cpumask_attrs[] = {
+	DSU_CPUMASK_ATTR(cpumask, DSU_ACTIVE_CPU_MASK),
+	DSU_CPUMASK_ATTR(supported_cpus, DSU_SUPPORTED_CPU_MASK),
+	NULL,
+};
+
+static const struct attribute_group dsu_pmu_cpumask_attr_group = {
+	.attrs = dsu_pmu_cpumask_attrs,
+};
+
+static const struct attribute_group *dsu_pmu_attr_groups[] = {
+	&dsu_pmu_cpumask_attr_group,
+	&dsu_pmu_events_attr_group,
+	&dsu_pmu_format_attr_group,
+	NULL,
+};
+
+static int dsu_pmu_get_online_cpu(struct dsu_pmu *dsu_pmu)
+{
+	return cpumask_first_and(&dsu_pmu->supported_cpus, cpu_online_mask);
+}
+
+static int dsu_pmu_get_online_cpu_any_but(struct dsu_pmu *dsu_pmu, int cpu)
+{
+	struct cpumask online_supported;
+
+	cpumask_and(&online_supported,
+			 &dsu_pmu->supported_cpus, cpu_online_mask);
+	return cpumask_any_but(&online_supported, cpu);
+}
+
+static inline bool dsu_pmu_counter_valid(struct dsu_pmu *dsu_pmu, u32 idx)
+{
+	return (idx < dsu_pmu->num_counters) ||
+	       (idx == DSU_PMU_IDX_CYCLE_COUNTER);
+}
+
+static inline u64 dsu_pmu_read_counter(struct perf_event *event)
+{
+	u64 val = 0;
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	int idx = event->hw.idx;
+
+	if (WARN_ON(!cpumask_test_cpu(smp_processor_id(),
+				 &dsu_pmu->supported_cpus)))
+		return 0;
+
+	if (!dsu_pmu_counter_valid(dsu_pmu, idx)) {
+		dev_err(event->pmu->dev,
+			"Trying reading invalid counter %d\n", idx);
+		return 0;
+	}
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	if (idx == DSU_PMU_IDX_CYCLE_COUNTER)
+		val = __dsu_pmu_read_pmccntr();
+	else
+		val = __dsu_pmu_read_counter(idx);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+
+	return val;
+}
+
+static void dsu_pmu_write_counter(struct perf_event *event, u64 val)
+{
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	int idx = event->hw.idx;
+
+	if (WARN_ON(!cpumask_test_cpu(smp_processor_id(),
+			 &dsu_pmu->supported_cpus)))
+		return;
+
+	if (!dsu_pmu_counter_valid(dsu_pmu, idx)) {
+		dev_err(event->pmu->dev,
+			"writing to invalid counter %d\n", idx);
+		return;
+	}
+
+	val &= DSU_PMU_COUNTER_MASK(idx);
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	if (idx == DSU_PMU_IDX_CYCLE_COUNTER)
+		__dsu_pmu_write_pmccntr(val);
+	else
+		__dsu_pmu_write_counter(idx, val);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static int dsu_pmu_get_event_idx(struct dsu_hw_events *hw_events,
+				 struct perf_event *event)
+{
+	int idx;
+	unsigned long evtype = event->attr.config;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	unsigned long *used_mask = hw_events->used_mask;
+
+	if (evtype == DSU_PMU_EVT_CYCLES) {
+		if (test_and_set_bit(DSU_PMU_IDX_CYCLE_COUNTER, used_mask))
+			return -EAGAIN;
+		return DSU_PMU_IDX_CYCLE_COUNTER;
+	}
+
+	idx = find_next_zero_bit(used_mask, dsu_pmu->num_counters, 0);
+	if (idx >= dsu_pmu->num_counters)
+		return -EAGAIN;
+	set_bit(idx, hw_events->used_mask);
+	return idx;
+}
+
+static void dsu_pmu_enable_counter(struct dsu_pmu *dsu_pmu, int idx)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	__dsu_pmu_counter_interrupt_enable(idx);
+	__dsu_pmu_enable_counter(idx);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static void dsu_pmu_disable_counter(struct dsu_pmu *dsu_pmu, int idx)
+{
+	unsigned long flags;
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	__dsu_pmu_disable_counter(idx);
+	__dsu_pmu_counter_interrupt_disable(idx);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static inline void dsu_pmu_set_event(struct dsu_pmu *dsu_pmu,
+					struct perf_event *event)
+{
+	int idx = event->hw.idx;
+	unsigned long flags;
+
+	if (!dsu_pmu_counter_valid(dsu_pmu, idx)) {
+		dev_err(event->pmu->dev,
+			"Trying to set invalid counter %d\n", idx);
+		return;
+	}
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	__dsu_pmu_set_event(idx, event->hw.config_base);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static void dsu_pmu_event_update(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 delta, prev_count, new_count;
+
+	do {
+		/* We may also be called from the irq handler */
+		prev_count = local64_read(&hwc->prev_count);
+		new_count = dsu_pmu_read_counter(event);
+	} while (local64_cmpxchg(&hwc->prev_count, prev_count, new_count) !=
+			prev_count);
+	delta = (new_count - prev_count) & DSU_PMU_COUNTER_MASK(hwc->idx);
+	local64_add(delta, &event->count);
+}
+
+static void dsu_pmu_read(struct perf_event *event)
+{
+	dsu_pmu_event_update(event);
+}
+
+static inline u32 dsu_pmu_get_status(void)
+{
+	return __dsu_pmu_get_pmovsclr();
+}
+
+/**
+ * dsu_pmu_set_event_period: Set the period for the counter.
+ *
+ * All DSU PMU event counters, except the cycle counter are 32bit
+ * counters. To handle cases of extreme interrupt latency, we program
+ * the counter with half of the max count for the counters.
+ */
+static void dsu_pmu_set_event_period(struct perf_event *event)
+{
+	int idx = event->hw.idx;
+	u64 val = DSU_PMU_COUNTER_MASK(idx) >> 1;
+
+	local64_set(&event->hw.prev_count, val);
+	dsu_pmu_write_counter(event, val);
+}
+
+static irqreturn_t dsu_pmu_handle_irq(int irq_num, void *dev)
+{
+	int i;
+	bool handled = false;
+	struct dsu_pmu *dsu_pmu = dev;
+	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+	unsigned long overflow, workset;
+
+	overflow = (unsigned long)dsu_pmu_get_status();
+	bitmap_and(&workset, &overflow, hw_events->used_mask,
+		   DSU_PMU_MAX_HW_CNTRS);
+
+	if (!workset)
+		return IRQ_NONE;
+
+	for_each_set_bit(i, &workset, DSU_PMU_MAX_HW_CNTRS) {
+		struct perf_event *event = hw_events->events[i];
+
+		if (WARN_ON(!event))
+			continue;
+		dsu_pmu_event_update(event);
+		dsu_pmu_set_event_period(event);
+
+		handled = true;
+	}
+
+	return IRQ_RETVAL(handled);
+}
+
+static void dsu_pmu_start(struct perf_event *event, int pmu_flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+
+	/* We always reprogram the counter */
+	if (pmu_flags & PERF_EF_RELOAD)
+		WARN_ON(!(event->hw.state & PERF_HES_UPTODATE));
+	dsu_pmu_set_event_period(event);
+	if (event->hw.idx != DSU_PMU_IDX_CYCLE_COUNTER)
+		dsu_pmu_set_event(dsu_pmu, event);
+	event->hw.state = 0;
+	dsu_pmu_enable_counter(dsu_pmu, event->hw.idx);
+}
+
+static void dsu_pmu_stop(struct perf_event *event, int pmu_flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+
+	if (event->hw.state & PERF_HES_STOPPED)
+		return;
+	dsu_pmu_disable_counter(dsu_pmu, event->hw.idx);
+	dsu_pmu_event_update(event);
+	event->hw.state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+}
+
+static int dsu_pmu_add(struct perf_event *event, int flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+
+	if (!cpumask_test_cpu(smp_processor_id(), &dsu_pmu->supported_cpus))
+		return -ENOENT;
+
+	idx = dsu_pmu_get_event_idx(hw_events, event);
+	if (idx < 0)
+		return idx;
+
+	hwc->idx = idx;
+	hw_events->events[idx] = event;
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		dsu_pmu_start(event, PERF_EF_RELOAD);
+
+	perf_event_update_userpage(event);
+	return 0;
+}
+
+static void dsu_pmu_del(struct perf_event *event, int flags)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+	struct dsu_hw_events *hw_events = &dsu_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	dsu_pmu_stop(event, PERF_EF_UPDATE);
+	hw_events->events[idx] = NULL;
+	clear_bit(idx, hw_events->used_mask);
+	perf_event_update_userpage(event);
+}
+
+static void dsu_pmu_enable(struct pmu *pmu)
+{
+	u32 pmcr;
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+	int enabled = bitmap_weight(dsu_pmu->hw_events.used_mask,
+				    DSU_PMU_MAX_HW_CNTRS);
+
+	if (!enabled)
+		return;
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	pmcr = __dsu_pmu_read_pmcr();
+	pmcr |= CLUSTERPMCR_E;
+	__dsu_pmu_write_pmcr(pmcr);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static void dsu_pmu_disable(struct pmu *pmu)
+{
+	u32 pmcr;
+	unsigned long flags;
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(pmu);
+
+	raw_spin_lock_irqsave(&dsu_pmu->pmu_lock, flags);
+	pmcr = __dsu_pmu_read_pmcr();
+	pmcr &= ~CLUSTERPMCR_E;
+	__dsu_pmu_write_pmcr(pmcr);
+	raw_spin_unlock_irqrestore(&dsu_pmu->pmu_lock, flags);
+}
+
+static int dsu_pmu_validate_event(struct pmu *pmu,
+				  struct dsu_hw_events *hw_events,
+				  struct perf_event *event)
+{
+	if (is_software_event(event))
+		return 1;
+	/* Reject groups spanning multiple HW PMUs. */
+	if (event->pmu != pmu)
+		return 0;
+	if (event->state < PERF_EVENT_STATE_OFF)
+		return 1;
+	if (event->state == PERF_EVENT_STATE_OFF && !event->attr.enable_on_exec)
+		return 1;
+	return dsu_pmu_get_event_idx(hw_events, event) >= 0;
+}
+
+/*
+ * Make sure the group of events can be scheduled at once
+ * on the PMU.
+ */
+static int dsu_pmu_validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+	struct dsu_hw_events fake_hw;
+
+	if (event->group_leader == event)
+		return 0;
+
+	memset(fake_hw.used_mask, 0, sizeof(fake_hw.used_mask));
+	if (!dsu_pmu_validate_event(event->pmu, &fake_hw, leader))
+		return -EINVAL;
+	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
+		if (!dsu_pmu_validate_event(event->pmu, &fake_hw, sibling))
+			return -EINVAL;
+	}
+	if (dsu_pmu_validate_event(event->pmu, &fake_hw, event))
+		return -EINVAL;
+	return 0;
+}
+
+static int dsu_pmu_event_init(struct perf_event *event)
+{
+	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	if (!dsu_pmu_event_supported(dsu_pmu, event->attr.config))
+		return -EOPNOTSUPP;
+
+	/* We cannot support task bound events */
+	if (event->cpu < 0) {
+		dev_dbg(dsu_pmu->pmu.dev, "Can't support per-task counters\n");
+		return -EINVAL;
+	}
+
+	/* We don't support sampling */
+	if (is_sampling_event(event)) {
+		dev_dbg(dsu_pmu->pmu.dev, "Can't support sampling events\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (has_branch_stack(event) ||
+	    event->attr.exclude_user ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv ||
+	    event->attr.exclude_idle ||
+	    event->attr.exclude_host ||
+	    event->attr.exclude_guest) {
+		dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
+		return -EINVAL;
+	}
+
+	if (dsu_pmu_validate_group(event))
+		return -EINVAL;
+
+	event->cpu = cpumask_first(&dsu_pmu->active_cpu);
+	if (event->cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	event->hw.config_base = event->attr.config;
+	return 0;
+}
+
+static struct dsu_pmu *dsu_pmu_alloc(struct platform_device *pdev)
+{
+	struct dsu_pmu *dsu_pmu;
+
+	dsu_pmu = devm_kzalloc(&pdev->dev, sizeof(*dsu_pmu), GFP_KERNEL);
+	if (!dsu_pmu)
+		return ERR_PTR(-ENOMEM);
+	raw_spin_lock_init(&dsu_pmu->pmu_lock);
+	return dsu_pmu;
+}
+
+/**
+ * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster.
+ */
+static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
+{
+	int i = 0, n, cpu;
+	struct device_node *cpu_node;
+
+	n = of_count_phandle_with_args(dev, "cpus", NULL);
+	if (n <= 0)
+		goto out;
+	for (; i < n; i++) {
+		cpu_node = of_parse_phandle(dev, "cpus", i);
+		if (!cpu_node)
+			break;
+		cpu = of_device_node_get_cpu(cpu_node);
+		of_node_put(cpu_node);
+		if (cpu >= nr_cpu_ids)
+			break;
+		cpumask_set_cpu(cpu, mask);
+	}
+out:
+	return i > 0;
+}
+
+/*
+ * dsu_pmu_probe_pmu: Probe the PMU details on a CPU in the cluster.
+ */
+static void dsu_pmu_probe_pmu(void *data)
+{
+	struct dsu_pmu *dsu_pmu = data;
+	u64 cpmcr;
+	u32 cpmceid[2];
+
+	if (WARN_ON(!cpumask_test_cpu(smp_processor_id(),
+		&dsu_pmu->supported_cpus)))
+		return;
+	cpmcr = __dsu_pmu_read_pmcr();
+	dsu_pmu->num_counters = ((cpmcr >> CLUSTERPMCR_N_SHIFT) &
+					CLUSTERPMCR_N_MASK);
+	if (!dsu_pmu->num_counters)
+		return;
+	cpmceid[0] = __dsu_pmu_read_pmceid(0);
+	cpmceid[1] = __dsu_pmu_read_pmceid(1);
+	bitmap_from_u32array(dsu_pmu->cpmceid_bitmap,
+				DSU_PMU_MAX_COMMON_EVENTS,
+				cpmceid,
+				ARRAY_SIZE(cpmceid));
+}
+
+static void dsu_pmu_cleanup_dev(struct dsu_pmu *dsu_pmu)
+{
+	cpuhp_state_remove_instance(dsu_pmu_cpuhp_state, &dsu_pmu->cpuhp_node);
+	irq_set_affinity_hint(dsu_pmu->irq, NULL);
+}
+
+static int dsu_pmu_probe(struct platform_device *pdev)
+{
+	int irq, rc, cpu;
+	struct dsu_pmu *dsu_pmu;
+	char *name;
+
+	static atomic_t pmu_idx = ATOMIC_INIT(-1);
+
+
+	dsu_pmu = dsu_pmu_alloc(pdev);
+	if (!dsu_pmu_dt_get_cpus(pdev->dev.of_node, &dsu_pmu->supported_cpus)) {
+		dev_warn(&pdev->dev, "Failed to parse the CPUs\n");
+		return -EINVAL;
+	}
+
+	rc = smp_call_function_any(&dsu_pmu->supported_cpus,
+					dsu_pmu_probe_pmu,
+					dsu_pmu, 1);
+	if (rc)
+		return rc;
+	if (!dsu_pmu->num_counters)
+		return -ENODEV;
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0) {
+		dev_warn(&pdev->dev, "Failed to find IRQ\n");
+		return -EINVAL;
+	}
+
+	name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%s_%d",
+				PMUNAME, atomic_inc_return(&pmu_idx));
+	rc = devm_request_irq(&pdev->dev, irq, dsu_pmu_handle_irq,
+					0, name, dsu_pmu);
+	if (rc) {
+		dev_warn(&pdev->dev, "Failed to request IRQ %d\n", irq);
+		return rc;
+	}
+
+	/*
+	 * Find one CPU in the DSU to handle the IRQs.
+	 * It is highly unlikely that we would fail
+	 * to find one, given the probing has succeeded.
+	 */
+	cpu = dsu_pmu_get_online_cpu(dsu_pmu);
+	if (cpu >= nr_cpu_ids)
+		return -ENODEV;
+	cpumask_set_cpu(cpu, &dsu_pmu->active_cpu);
+	rc = irq_set_affinity_hint(irq, &dsu_pmu->active_cpu);
+	if (rc) {
+		dev_warn(&pdev->dev, "Failed to force IRQ affinity for %d\n",
+					 irq);
+		return rc;
+	}
+
+	platform_set_drvdata(pdev, dsu_pmu);
+	rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
+						&dsu_pmu->cpuhp_node);
+	if (rc)
+		return rc;
+
+	dsu_pmu->irq = irq;
+	dsu_pmu->pmu = (struct pmu) {
+		.task_ctx_nr	= perf_invalid_context,
+
+		.pmu_enable	= dsu_pmu_enable,
+		.pmu_disable	= dsu_pmu_disable,
+		.event_init	= dsu_pmu_event_init,
+		.add		= dsu_pmu_add,
+		.del		= dsu_pmu_del,
+		.start		= dsu_pmu_start,
+		.stop		= dsu_pmu_stop,
+		.read		= dsu_pmu_read,
+
+		.attr_groups	= dsu_pmu_attr_groups,
+	};
+
+	rc = perf_pmu_register(&dsu_pmu->pmu, name, -1);
+
+	if (!rc)
+		dev_info(&pdev->dev, "Registered %s with %d event counters",
+				name, dsu_pmu->num_counters);
+	else
+		dsu_pmu_cleanup_dev(dsu_pmu);
+	return rc;
+}
+
+static int dsu_pmu_device_remove(struct platform_device *pdev)
+{
+	struct dsu_pmu *dsu_pmu = platform_get_drvdata(pdev);
+
+	dsu_pmu_cleanup_dev(dsu_pmu);
+	perf_pmu_unregister(&dsu_pmu->pmu);
+	return 0;
+}
+
+static const struct of_device_id dsu_pmu_of_match[] = {
+	{ .compatible = "arm,dsu-pmu", },
+	{},
+};
+
+static struct platform_driver dsu_pmu_driver = {
+	.driver = {
+		.name	= DRVNAME,
+		.of_match_table = of_match_ptr(dsu_pmu_of_match),
+	},
+	.probe = dsu_pmu_probe,
+	.remove = dsu_pmu_device_remove,
+};
+
+static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	int dst;
+	struct dsu_pmu *dsu_pmu = container_of(node,
+						struct dsu_pmu, cpuhp_node);
+
+	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu))
+		return 0;
+
+	dst = dsu_pmu_get_online_cpu_any_but(dsu_pmu, cpu);
+	if (dst < nr_cpu_ids) {
+		cpumask_set_cpu(dst, &dsu_pmu->active_cpu);
+		perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
+		irq_set_affinity_hint(dsu_pmu->irq, &dsu_pmu->active_cpu);
+	}
+
+	return 0;
+}
+
+static int __init dsu_pmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+					DRVNAME,
+					NULL,
+					dsu_pmu_cpu_teardown);
+	if (ret < 0)
+		return ret;
+	dsu_pmu_cpuhp_state = ret;
+	return platform_driver_register(&dsu_pmu_driver);
+}
+
+static void __exit dsu_pmu_exit(void)
+{
+	platform_driver_unregister(&dsu_pmu_driver);
+	cpuhp_remove_multi_state(dsu_pmu_cpuhp_state);
+}
+
+module_init(dsu_pmu_init);
+module_exit(dsu_pmu_exit);
+
+MODULE_DESCRIPTION("Perf driver for ARM DynamIQ Shared Unit");
+MODULE_AUTHOR("Suzuki K Poulose <suzuki.poulose@arm.com>");
+MODULE_LICENSE("GPL v2");
-- 
2.7.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number
  2017-07-24 10:29 ` [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number Suzuki K Poulose
@ 2017-07-24 13:14   ` Marc Zyngier
  2017-07-24 15:15     ` Suzuki K Poulose
  2017-07-24 16:42   ` Sudeep Holla
  1 sibling, 1 reply; 17+ messages in thread
From: Marc Zyngier @ 2017-07-24 13:14 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, peterz, Rob Herring

On 24/07/17 11:29, Suzuki K Poulose wrote:
> Add a helper to map a device node to a logical CPU number to avoid
> duplication. Currently this is open coded in different places (e.g
> gic-v3, coresight). The helper tries to map device node to a "possible"
> logical CPU id, which may not be online yet. It is the responsibility
> of the user to make sure that the CPU is online. The helper uses
> of_get_cpu_node() which uses arch specific backends to match the phyiscal
> ids.
> 
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/of/base.c         | 26 ++++++++++++++++++++++++++
>  include/linux/of_device.h |  7 +++++++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 686628d..639af23 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -420,6 +420,32 @@ struct device_node *of_get_cpu_node(int cpu, unsigned int *thread)
>  EXPORT_SYMBOL(of_get_cpu_node);
>  
>  /**
> + * of_device_node_get_cpu: Get the logical CPU number for a given device_node
> + *
> + * @cpu_node: Pointer to the device_node for CPU.
> + *
> + * Returns the logical CPU number of the given CPU device_node.
> + * Returns >= nr_cpu_ids if CPU is not found.
> + */
> +int of_device_node_get_cpu(struct device_node *cpu_node)
> +{
> +	int cpu, thread;
> +	bool found = false;
> +	struct device_node *np;
> +
> +	for_each_possible_cpu(cpu) {
> +		np = of_get_cpu_node(cpu, &thread);

nit: Since you don't need the thread here, you can pass NULL instead.

> +		found = (cpu_node == np);
> +		of_node_put(np);
> +		if (found)
> +			break;
> +	}
> +
> +	return cpu;
> +}
> +EXPORT_SYMBOL(of_device_node_get_cpu);
> +
> +/**
>   * __of_device_is_compatible() - Check if the node matches given constraints
>   * @device: pointer to node
>   * @compat: required compatible string, NULL or "" for any match
> diff --git a/include/linux/of_device.h b/include/linux/of_device.h
> index b4ad8b4..00a4ba9 100644
> --- a/include/linux/of_device.h
> +++ b/include/linux/of_device.h
> @@ -40,6 +40,8 @@ extern int of_device_request_module(struct device *dev);
>  extern void of_device_uevent(struct device *dev, struct kobj_uevent_env *env);
>  extern int of_device_uevent_modalias(struct device *dev, struct kobj_uevent_env *env);
>  
> +extern int of_device_node_get_cpu(struct device_node *np);
> +
>  static inline void of_device_node_put(struct device *dev)
>  {
>  	of_node_put(dev->of_node);
> @@ -110,6 +112,11 @@ static inline int of_dma_configure(struct device *dev, struct device_node *np)
>  }
>  static inline void of_dma_deconfigure(struct device *dev)
>  {}
> +
> +static inline int of_device_node_get_cpu(struct device_node *np)
> +{
> +	return nr_cpu_ids;
> +}
>  #endif /* CONFIG_OF */
>  
>  #endif /* _LINUX_OF_DEVICE_H */
> 

FWIW:

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 4/6] irqchip: gic-v3: Use of_device_node_get_cpu helper
  2017-07-24 10:29 ` [PATCH v2 4/6] irqchip: gic-v3: " Suzuki K Poulose
@ 2017-07-24 13:15   ` Marc Zyngier
  0 siblings, 0 replies; 17+ messages in thread
From: Marc Zyngier @ 2017-07-24 13:15 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, peterz

On 24/07/17 11:29, Suzuki K Poulose wrote:
> Use the new generic helper of_device_node_get_cpu() instead
> of using our own version to map a device node to logical CPU
> number.
> 
> Cc: Marc Zyngier <marc.zyngier@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/irqchip/irq-gic-v3.c | 30 +++---------------------------
>  1 file changed, 3 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3.c b/drivers/irqchip/irq-gic-v3.c
> index dbffb7a..b7ce12b 100644
> --- a/drivers/irqchip/irq-gic-v3.c
> +++ b/drivers/irqchip/irq-gic-v3.c
> @@ -25,6 +25,7 @@
>  #include <linux/irqdomain.h>
>  #include <linux/of.h>
>  #include <linux/of_address.h>
> +#include <linux/of_device.h>
>  #include <linux/of_irq.h>
>  #include <linux/percpu.h>
>  #include <linux/slab.h>
> @@ -978,31 +979,6 @@ static int __init gic_validate_dist_version(void __iomem *dist_base)
>  	return 0;
>  }
>  
> -static int get_cpu_number(struct device_node *dn)
> -{
> -	const __be32 *cell;
> -	u64 hwid;
> -	int i;
> -
> -	cell = of_get_property(dn, "reg", NULL);
> -	if (!cell)
> -		return -1;
> -
> -	hwid = of_read_number(cell, of_n_addr_cells(dn));
> -
> -	/*
> -	 * Non affinity bits must be set to 0 in the DT
> -	 */
> -	if (hwid & ~MPIDR_HWID_BITMASK)
> -		return -1;
> -
> -	for (i = 0; i < num_possible_cpus(); i++)
> -		if (cpu_logical_map(i) == hwid)
> -			return i;
> -
> -	return -1;
> -}
> -
>  /* Create all possible partitions at boot time */
>  static void __init gic_populate_ppi_partitions(struct device_node *gic_node)
>  {
> @@ -1053,8 +1029,8 @@ static void __init gic_populate_ppi_partitions(struct device_node *gic_node)
>  			if (WARN_ON(!cpu_node))
>  				continue;
>  
> -			cpu = get_cpu_number(cpu_node);
> -			if (WARN_ON(cpu == -1))
> +			cpu = of_device_node_get_cpu(cpu_node);
> +			if (WARN_ON(cpu >= nr_cpu_ids))
>  				continue;
>  
>  			pr_cont("%s[%d] ", cpu_node->full_name, cpu);
> 

Acked-by: Marc Zyngier <marc.zyngier@arm.com>

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 6/6] perf: ARM DynamIQ Shared Unit PMU support
  2017-07-24 10:29 ` [PATCH v2 6/6] perf: ARM DynamIQ Shared Unit PMU support Suzuki K Poulose
@ 2017-07-24 14:50   ` Jonathan Cameron
  2017-07-24 15:44     ` Suzuki K Poulose
  0 siblings, 1 reply; 17+ messages in thread
From: Jonathan Cameron @ 2017-07-24 14:50 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, mark.rutland, peterz, will.deacon, linux-kernel

On Mon, 24 Jul 2017 11:29:21 +0100
Suzuki K Poulose <suzuki.poulose@arm.com> wrote:

> Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
> The DSU integrates one or more cores with an L3 memory system, control
> logic, and external interfaces to form a multicore cluster. The PMU
> allows counting the various events related to L3, SCU etc, along with
> providing a cycle counter.
> 
> The PMU can be accessed via system registers, which are common
> to the cores in the same cluster. The PMU registers follow the
> semantics of the ARMv8 PMU, mostly, with the exception that
> the counters record the cluster wide events.
> 
> This driver is mostly based on the ARMv8 and CCI PMU drivers.
> 
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
A few quick comments.

Jonathan
> ---
>  arch/arm64/include/asm/arm_dsu_pmu.h | 124 +++++
>  drivers/perf/Kconfig                 |   9 +
>  drivers/perf/Makefile                |   1 +
>  drivers/perf/arm_dsu_pmu.c           | 877 +++++++++++++++++++++++++++++++++++
>  4 files changed, 1011 insertions(+)
>  create mode 100644 arch/arm64/include/asm/arm_dsu_pmu.h
>  create mode 100644 drivers/perf/arm_dsu_pmu.c
> 
> diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h b/arch/arm64/include/asm/arm_dsu_pmu.h
> new file mode 100644
> index 0000000..e7276fd
> --- /dev/null
> +++ b/arch/arm64/include/asm/arm_dsu_pmu.h
> @@ -0,0 +1,124 @@
<snip>
> +static inline void __dsu_pmu_counter_interrupt_disable(int counter)
> +{
> +	write_sysreg_s(BIT(counter), CLUSTERPMINTENCLR_EL1);
> +	isb();
> +}
> +
> +
> +static inline u32 __dsu_pmu_read_pmceid(int n)
> +{
> +	switch (n) {
> +	case 0:
> +		return read_sysreg_s(CLUSTERPMCEID0_EL1);
> +	case 1:
> +		return read_sysreg_s(CLUSTERPMCEID1_EL1);
> +	default:
> +		BUILD_BUG();
> +		return 0;
> +	}
> +}
What is the benefit of having this lot in a header?  Is it to support future
additional drivers? If not, why not just push them down into the c code.

> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index e5197ff..ee3d7d1 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -17,6 +17,15 @@ config ARM_PMU_ACPI
>  	depends on ARM_PMU && ACPI
>  	def_bool y
>  
> +config ARM_DSU_PMU
> +	tristate "ARM DynamIQ Shared Unit (DSU) PMU"
> +	depends on ARM64 && PERF_EVENTS
> +	  help
> +	  Provides support for performance monitor unit in ARM DynamIQ Shared
> +	  Unit (DSU). The DSU integrates one or more cores  with an L3 memory
> +	  system, control logic. The PMU allows counting various events related
> +	  to DSU.
> +
>  config QCOM_L2_PMU
>  	bool "Qualcomm Technologies L2-cache PMU"
>  	depends on ARCH_QCOM && ARM64 && ACPI
> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index 6420bd4..0adb4f6 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -1,5 +1,6 @@
>  obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o
>  obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o
> +obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o
>  obj-$(CONFIG_QCOM_L2_PMU)	+= qcom_l2_pmu.o
>  obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
>  obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
> diff --git a/drivers/perf/arm_dsu_pmu.c b/drivers/perf/arm_dsu_pmu.c
> new file mode 100644
> index 0000000..60b2c03
> --- /dev/null
> +++ b/drivers/perf/arm_dsu_pmu.c
<snip>
> +
> +/*
> + * Make sure the group of events can be scheduled at once
> + * on the PMU.
> + */
> +static int dsu_pmu_validate_group(struct perf_event *event)
> +{
> +	struct perf_event *sibling, *leader = event->group_leader;
> +	struct dsu_hw_events fake_hw;
> +
> +	if (event->group_leader == event)
> +		return 0;
> +
> +	memset(fake_hw.used_mask, 0, sizeof(fake_hw.used_mask));
> +	if (!dsu_pmu_validate_event(event->pmu, &fake_hw, leader))
> +		return -EINVAL;
> +	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
> +		if (!dsu_pmu_validate_event(event->pmu, &fake_hw, sibling))
> +			return -EINVAL;
> +	}
> +	if (dsu_pmu_validate_event(event->pmu, &fake_hw, event))
Perhaps a comment to say why in this case validate_event has the opposite
meaning to the others cases above? (not !dsu_pmu_validate_event())
> +		return -EINVAL;
> +	return 0;
> +}
> +
> +static int dsu_pmu_event_init(struct perf_event *event)
> +{
> +	struct dsu_pmu *dsu_pmu = to_dsu_pmu(event->pmu);
> +
> +	if (event->attr.type != event->pmu->type)
> +		return -ENOENT;
> +
> +	if (!dsu_pmu_event_supported(dsu_pmu, event->attr.config))
> +		return -EOPNOTSUPP;
> +
> +	/* We cannot support task bound events */
> +	if (event->cpu < 0) {
> +		dev_dbg(dsu_pmu->pmu.dev, "Can't support per-task counters\n");
> +		return -EINVAL;
> +	}
> +
> +	/* We don't support sampling */
> +	if (is_sampling_event(event)) {
> +		dev_dbg(dsu_pmu->pmu.dev, "Can't support sampling events\n");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if (has_branch_stack(event) ||
> +	    event->attr.exclude_user ||
> +	    event->attr.exclude_kernel ||
> +	    event->attr.exclude_hv ||
> +	    event->attr.exclude_idle ||
> +	    event->attr.exclude_host ||
> +	    event->attr.exclude_guest) {
> +		dev_dbg(dsu_pmu->pmu.dev, "Can't support filtering\n");
> +		return -EINVAL;
> +	}
> +
> +	if (dsu_pmu_validate_group(event))
> +		return -EINVAL;
> +
> +	event->cpu = cpumask_first(&dsu_pmu->active_cpu);
> +	if (event->cpu >= nr_cpu_ids)
> +		return -EINVAL;
> +
> +	event->hw.config_base = event->attr.config;
> +	return 0;
> +}
> +
> +static struct dsu_pmu *dsu_pmu_alloc(struct platform_device *pdev)
> +{
> +	struct dsu_pmu *dsu_pmu;
> +
> +	dsu_pmu = devm_kzalloc(&pdev->dev, sizeof(*dsu_pmu), GFP_KERNEL);
> +	if (!dsu_pmu)
> +		return ERR_PTR(-ENOMEM);
A blank line here would make it a little more readable
> +	raw_spin_lock_init(&dsu_pmu->pmu_lock);
And one here.
> +	return dsu_pmu;
> +}
> +
> +/**
> + * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster.
> + */
> +static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
> +{
> +	int i = 0, n, cpu;
> +	struct device_node *cpu_node;
> +
> +	n = of_count_phandle_with_args(dev, "cpus", NULL);
> +	if (n <= 0)
> +		goto out;
> +	for (; i < n; i++) {
> +		cpu_node = of_parse_phandle(dev, "cpus", i);
> +		if (!cpu_node)
> +			break;
> +		cpu = of_device_node_get_cpu(cpu_node);
> +		of_node_put(cpu_node);
> +		if (cpu >= nr_cpu_ids)
> +			break;
It rather seems like this is an error we would not want to skip over.
> +		cpumask_set_cpu(cpu, mask);
> +	}
> +out:
> +	return i > 0;
Cleaner to actually return appropriate errors from within
this function and pass them all the way up.
> +}
> +
> +/*
> + * dsu_pmu_probe_pmu: Probe the PMU details on a CPU in the cluster.
> + */
> +static void dsu_pmu_probe_pmu(void *data)
> +{
> +	struct dsu_pmu *dsu_pmu = data;
> +	u64 cpmcr;
> +	u32 cpmceid[2];
> +
> +	if (WARN_ON(!cpumask_test_cpu(smp_processor_id(),
> +		&dsu_pmu->supported_cpus)))
> +		return;
> +	cpmcr = __dsu_pmu_read_pmcr();
> +	dsu_pmu->num_counters = ((cpmcr >> CLUSTERPMCR_N_SHIFT) &
> +					CLUSTERPMCR_N_MASK);
> +	if (!dsu_pmu->num_counters)
> +		return;
> +	cpmceid[0] = __dsu_pmu_read_pmceid(0);
> +	cpmceid[1] = __dsu_pmu_read_pmceid(1);
> +	bitmap_from_u32array(dsu_pmu->cpmceid_bitmap,
> +				DSU_PMU_MAX_COMMON_EVENTS,
> +				cpmceid,
> +				ARRAY_SIZE(cpmceid));
> +}
> +
> +static void dsu_pmu_cleanup_dev(struct dsu_pmu *dsu_pmu)
> +{
> +	cpuhp_state_remove_instance(dsu_pmu_cpuhp_state, &dsu_pmu->cpuhp_node);
> +	irq_set_affinity_hint(dsu_pmu->irq, NULL);
> +}
> +
> +static int dsu_pmu_probe(struct platform_device *pdev)
> +{
> +	int irq, rc, cpu;
> +	struct dsu_pmu *dsu_pmu;
> +	char *name;
> +
> +	static atomic_t pmu_idx = ATOMIC_INIT(-1);
> +
> +
One blank line only.
> +	dsu_pmu = dsu_pmu_alloc(pdev);
> +	if (!dsu_pmu_dt_get_cpus(pdev->dev.of_node, &dsu_pmu->supported_cpus)) {
> +		dev_warn(&pdev->dev, "Failed to parse the CPUs\n");
> +		return -EINVAL;
> +	}
> +
> +	rc = smp_call_function_any(&dsu_pmu->supported_cpus,
> +					dsu_pmu_probe_pmu,
> +					dsu_pmu, 1);
> +	if (rc)
> +		return rc;
> +	if (!dsu_pmu->num_counters)
> +		return -ENODEV;
> +	irq = platform_get_irq(pdev, 0);
> +	if (irq < 0) {
> +		dev_warn(&pdev->dev, "Failed to find IRQ\n");
> +		return -EINVAL;
> +	}
> +
> +	name = devm_kasprintf(&pdev->dev, GFP_KERNEL, "%s_%d",
> +				PMUNAME, atomic_inc_return(&pmu_idx));
> +	rc = devm_request_irq(&pdev->dev, irq, dsu_pmu_handle_irq,
> +					0, name, dsu_pmu);
> +	if (rc) {
> +		dev_warn(&pdev->dev, "Failed to request IRQ %d\n", irq);
> +		return rc;
> +	}
> +
> +	/*
> +	 * Find one CPU in the DSU to handle the IRQs.
> +	 * It is highly unlikely that we would fail
> +	 * to find one, given the probing has succeeded.
> +	 */
> +	cpu = dsu_pmu_get_online_cpu(dsu_pmu);
> +	if (cpu >= nr_cpu_ids)
> +		return -ENODEV;
> +	cpumask_set_cpu(cpu, &dsu_pmu->active_cpu);
> +	rc = irq_set_affinity_hint(irq, &dsu_pmu->active_cpu);
> +	if (rc) {
> +		dev_warn(&pdev->dev, "Failed to force IRQ affinity for %d\n",
> +					 irq);
> +		return rc;
> +	}
It is a little unusual that you have the above two elements inline
here, but have a function to unwind them.  Just makes it a little
harder to read and leads to missing things like...
> +
> +	platform_set_drvdata(pdev, dsu_pmu);
> +	rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
> +						&dsu_pmu->cpuhp_node);
> +	if (rc)
I believe irq_set_affinity_hit(dsu_pmu->irq, NULL) would make sense
here.
> +		return rc;
> +
> +	dsu_pmu->irq = irq;
> +	dsu_pmu->pmu = (struct pmu) {
> +		.task_ctx_nr	= perf_invalid_context,
> +
> +		.pmu_enable	= dsu_pmu_enable,
> +		.pmu_disable	= dsu_pmu_disable,
> +		.event_init	= dsu_pmu_event_init,
> +		.add		= dsu_pmu_add,
> +		.del		= dsu_pmu_del,
> +		.start		= dsu_pmu_start,
> +		.stop		= dsu_pmu_stop,
> +		.read		= dsu_pmu_read,
> +
> +		.attr_groups	= dsu_pmu_attr_groups,
> +	};
> +
> +	rc = perf_pmu_register(&dsu_pmu->pmu, name, -1);
> +
> +	if (!rc)
> +		dev_info(&pdev->dev, "Registered %s with %d event counters",
> +				name, dsu_pmu->num_counters);
> +	else
> +		dsu_pmu_cleanup_dev(dsu_pmu);
It is cleaner to have the error handled as the 'exceptional'
element.  Slightly more code, but easier to read.
i.e.

	if (rc) {
		dsu_pmu_cleanup_dev(dsu_pmu);
		return rc;
	}

	dev_info(...)

> +	return rc;
> +}
> +
> +static int dsu_pmu_device_remove(struct platform_device *pdev)
The difference in naming style between this and probe is a little
confusing.

Why not dsu_pmu_remove?
> +{
> +	struct dsu_pmu *dsu_pmu = platform_get_drvdata(pdev);
> +
> +	dsu_pmu_cleanup_dev(dsu_pmu);
> +	perf_pmu_unregister(&dsu_pmu->pmu);
The remove order should be the reverse of probe.
It just makes it more 'obviously' right and saves reviewer time.
If there is a reason not to do this, there should be a comment saying
why.

> +	return 0;
> +}
> +
> +static const struct of_device_id dsu_pmu_of_match[] = {
> +	{ .compatible = "arm,dsu-pmu", },
> +	{},
> +};
> +
> +static struct platform_driver dsu_pmu_driver = {
> +	.driver = {
> +		.name	= DRVNAME,
> +		.of_match_table = of_match_ptr(dsu_pmu_of_match),
> +	},
> +	.probe = dsu_pmu_probe,
> +	.remove = dsu_pmu_device_remove,
> +};
> +
> +static int dsu_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
> +{
> +	int dst;
> +	struct dsu_pmu *dsu_pmu = container_of(node,
> +						struct dsu_pmu, cpuhp_node);
> +
> +	if (!cpumask_test_and_clear_cpu(cpu, &dsu_pmu->active_cpu))
> +		return 0;
> +
> +	dst = dsu_pmu_get_online_cpu_any_but(dsu_pmu, cpu);
> +	if (dst < nr_cpu_ids) {
> +		cpumask_set_cpu(dst, &dsu_pmu->active_cpu);
> +		perf_pmu_migrate_context(&dsu_pmu->pmu, cpu, dst);
> +		irq_set_affinity_hint(dsu_pmu->irq, &dsu_pmu->active_cpu);
> +	}
> +
> +	return 0;
> +}
> +
> +static int __init dsu_pmu_init(void)
> +{
> +	int ret;
> +
> +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
> +					DRVNAME,
> +					NULL,
> +					dsu_pmu_cpu_teardown);
> +	if (ret < 0)
> +		return ret;
> +	dsu_pmu_cpuhp_state = ret;
I'm just curious - what prevents this initialization being done in probe
rather than init?

> +	return platform_driver_register(&dsu_pmu_driver);
> +}
> +
> +static void __exit dsu_pmu_exit(void)
> +{
> +	platform_driver_unregister(&dsu_pmu_driver);
> +	cpuhp_remove_multi_state(dsu_pmu_cpuhp_state);
> +}
> +
> +module_init(dsu_pmu_init);
> +module_exit(dsu_pmu_exit);
> +
> +MODULE_DESCRIPTION("Perf driver for ARM DynamIQ Shared Unit");
> +MODULE_AUTHOR("Suzuki K Poulose <suzuki.poulose@arm.com>");
> +MODULE_LICENSE("GPL v2");

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number
  2017-07-24 13:14   ` Marc Zyngier
@ 2017-07-24 15:15     ` Suzuki K Poulose
  0 siblings, 0 replies; 17+ messages in thread
From: Suzuki K Poulose @ 2017-07-24 15:15 UTC (permalink / raw)
  To: Marc Zyngier, linux-arm-kernel
  Cc: linux-kernel, will.deacon, mark.rutland, peterz, Rob Herring

Hi Marc,


On 24/07/17 14:14, Marc Zyngier wrote:
> On 24/07/17 11:29, Suzuki K Poulose wrote:
>> Add a helper to map a device node to a logical CPU number to avoid
>> duplication. Currently this is open coded in different places (e.g
>> gic-v3, coresight). The helper tries to map device node to a "possible"
>> logical CPU id, which may not be online yet. It is the responsibility
>> of the user to make sure that the CPU is online. The helper uses
>> of_get_cpu_node() which uses arch specific backends to match the phyiscal
>> ids.
>>
>> Cc: Rob Herring <robh+dt@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  drivers/of/base.c         | 26 ++++++++++++++++++++++++++
>>  include/linux/of_device.h |  7 +++++++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>> index 686628d..639af23 100644
>> --- a/drivers/of/base.c
>> +++ b/drivers/of/base.c

>> +int of_device_node_get_cpu(struct device_node *cpu_node)
>> +{
>> +	int cpu, thread;
>> +	bool found = false;
>> +	struct device_node *np;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		np = of_get_cpu_node(cpu, &thread);
>
> nit: Since you don't need the thread here, you can pass NULL instead.
>

Sure, will do that in the next version.


>
> FWIW:
>
> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
>

Thanks for taking a look.

Suzuki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 6/6] perf: ARM DynamIQ Shared Unit PMU support
  2017-07-24 14:50   ` Jonathan Cameron
@ 2017-07-24 15:44     ` Suzuki K Poulose
  2017-07-25  8:33       ` Jonathan Cameron
  0 siblings, 1 reply; 17+ messages in thread
From: Suzuki K Poulose @ 2017-07-24 15:44 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: linux-arm-kernel, mark.rutland, peterz, will.deacon, linux-kernel

Hi Jonathan,


On 24/07/17 15:50, Jonathan Cameron wrote:
> On Mon, 24 Jul 2017 11:29:21 +0100
> Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
>> Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
>> The DSU integrates one or more cores with an L3 memory system, control
>> logic, and external interfaces to form a multicore cluster. The PMU
>> allows counting the various events related to L3, SCU etc, along with
>> providing a cycle counter.
>>
>> The PMU can be accessed via system registers, which are common
>> to the cores in the same cluster. The PMU registers follow the
>> semantics of the ARMv8 PMU, mostly, with the exception that
>> the counters record the cluster wide events.
>>
>> This driver is mostly based on the ARMv8 and CCI PMU drivers.
>>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Cc: Will Deacon <will.deacon@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> A few quick comments.

Thanks for the detailed look. Comments in line. Btw, please could you leave a
blank line after the quoted text and before your comment (like what I have
don here) ? That way, it is way may much easier to find your comments.

>
> Jonathan
>> ---

>>
>> diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h b/arch/arm64/include/asm/arm_dsu_pmu.h
>> new file mode 100644
>> index 0000000..e7276fd
>> --- /dev/null
>> +++ b/arch/arm64/include/asm/arm_dsu_pmu.h
>> @@ -0,0 +1,124 @@
> <snip>
>> +static inline void __dsu_pmu_counter_interrupt_disable(int counter)
>> +{
>> +	write_sysreg_s(BIT(counter), CLUSTERPMINTENCLR_EL1);
>> +	isb();
>> +}
>> +
>> +
>> +static inline u32 __dsu_pmu_read_pmceid(int n)
>> +{
>> +	switch (n) {
>> +	case 0:
>> +		return read_sysreg_s(CLUSTERPMCEID0_EL1);
>> +	case 1:
>> +		return read_sysreg_s(CLUSTERPMCEID1_EL1);
>> +	default:
>> +		BUILD_BUG();
>> +		return 0;
>> +	}
>> +}
> What is the benefit of having this lot in a header?  Is it to support future
> additional drivers? If not, why not just push them down into the c code.

As I mentioned in the cover letter, this is to keep the architecture specific code
separate so that we could easily add support for this on arm32 kernel.

>> --- /dev/null
>> +++ b/drivers/perf/arm_dsu_pmu.c
> <snip>
>> +
>> +/*
>> + * Make sure the group of events can be scheduled at once
>> + * on the PMU.
>> + */
>> +static int dsu_pmu_validate_group(struct perf_event *event)
>> +{
>> +	struct perf_event *sibling, *leader = event->group_leader;
>> +	struct dsu_hw_events fake_hw;
>> +
>> +	if (event->group_leader == event)
>> +		return 0;
>> +
>> +	memset(fake_hw.used_mask, 0, sizeof(fake_hw.used_mask));
>> +	if (!dsu_pmu_validate_event(event->pmu, &fake_hw, leader))
>> +		return -EINVAL;
>> +	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
>> +		if (!dsu_pmu_validate_event(event->pmu, &fake_hw, sibling))
>> +			return -EINVAL;
>> +	}
>> +	if (dsu_pmu_validate_event(event->pmu, &fake_hw, event))
> Perhaps a comment to say why in this case validate_event has the opposite
> meaning to the others cases above? (not !dsu_pmu_validate_event())

Ah! Thanks for spotting. Thats a mistake. It should be !dsu_pmu_validate_event().
I will fix it in the next version. We are making sure that the event can be
scheduled, with the other events in the group already added.

>> +
>> +static struct dsu_pmu *dsu_pmu_alloc(struct platform_device *pdev)
>> +{
>> +	struct dsu_pmu *dsu_pmu;
>> +
>> +	dsu_pmu = devm_kzalloc(&pdev->dev, sizeof(*dsu_pmu), GFP_KERNEL);
>> +	if (!dsu_pmu)
>> +		return ERR_PTR(-ENOMEM);
> A blank line here would make it a little more readable
>> +	raw_spin_lock_init(&dsu_pmu->pmu_lock);
> And one here.
>> +	return dsu_pmu;

It doesn't look that complex here, given it doesn't take the lock.
If it does help the reading, I could add it.

>> +}
>> +
>> +/**
>> + * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster.
>> + */
>> +static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
>> +{
>> +	int i = 0, n, cpu;
>> +	struct device_node *cpu_node;
>> +
>> +	n = of_count_phandle_with_args(dev, "cpus", NULL);
>> +	if (n <= 0)
>> +		goto out;
>> +	for (; i < n; i++) {
>> +		cpu_node = of_parse_phandle(dev, "cpus", i);
>> +		if (!cpu_node)
>> +			break;
>> +		cpu = of_device_node_get_cpu(cpu_node);
>> +		of_node_put(cpu_node);
>> +		if (cpu >= nr_cpu_ids)
>> +			break;
> It rather seems like this is an error we would not want to skip over.

Ok. That makes sense to me. I can return -EINVAL here.

>> +		cpumask_set_cpu(cpu, mask);
>> +	}
>> +out:
>> +	return i > 0;
> Cleaner to actually return appropriate errors from within
> this function and pass them all the way up.

Sure, will do.

>> +static int dsu_pmu_probe(struct platform_device *pdev)
>> +{
>> +	int irq, rc, cpu;
>> +	struct dsu_pmu *dsu_pmu;
>> +	char *name;
>> +
>> +	static atomic_t pmu_idx = ATOMIC_INIT(-1);
>> +
>> +
> One blank line only.

Ok.

>> +	/*
>> +	 * Find one CPU in the DSU to handle the IRQs.
>> +	 * It is highly unlikely that we would fail
>> +	 * to find one, given the probing has succeeded.
>> +	 */
>> +	cpu = dsu_pmu_get_online_cpu(dsu_pmu);
>> +	if (cpu >= nr_cpu_ids)
>> +		return -ENODEV;
>> +	cpumask_set_cpu(cpu, &dsu_pmu->active_cpu);
>> +	rc = irq_set_affinity_hint(irq, &dsu_pmu->active_cpu);
>> +	if (rc) {
>> +		dev_warn(&pdev->dev, "Failed to force IRQ affinity for %d\n",
>> +					 irq);
>> +		return rc;
>> +	}

> It is a little unusual that you have the above two elements inline
> here, but have a function to unwind them.  Just makes it a little
> harder to read and leads to missing things like...

The unwinding was added as a function to reuse the code. The "setup" steps
undone by the unwind doesn't look separate from what we do in the probe,
hence didn't go for a separate function.

>> +
>> +	platform_set_drvdata(pdev, dsu_pmu);
>> +	rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
>> +						&dsu_pmu->cpuhp_node);
>> +	if (rc)
> I believe irq_set_affinity_hit(dsu_pmu->irq, NULL) would make sense
> here.

Yes, you're right. Otherwise we could hit a WARN_ON. I will rearrange
the probe code to a cleaner state.

>> +		return rc;
>> +
>> +	dsu_pmu->irq = irq;
>> +	dsu_pmu->pmu = (struct pmu) {
>> +		.task_ctx_nr	= perf_invalid_context,
>> +
>> +		.pmu_enable	= dsu_pmu_enable,
>> +		.pmu_disable	= dsu_pmu_disable,
>> +		.event_init	= dsu_pmu_event_init,
>> +		.add		= dsu_pmu_add,
>> +		.del		= dsu_pmu_del,
>> +		.start		= dsu_pmu_start,
>> +		.stop		= dsu_pmu_stop,
>> +		.read		= dsu_pmu_read,
>> +
>> +		.attr_groups	= dsu_pmu_attr_groups,
>> +	};
>> +
>> +	rc = perf_pmu_register(&dsu_pmu->pmu, name, -1);
>> +
>> +	if (!rc)
>> +		dev_info(&pdev->dev, "Registered %s with %d event counters",
>> +				name, dsu_pmu->num_counters);
>> +	else
>> +		dsu_pmu_cleanup_dev(dsu_pmu);
> It is cleaner to have the error handled as the 'exceptional'
> element.  Slightly more code, but easier to read.
> i.e.
>
> 	if (rc) {
> 		dsu_pmu_cleanup_dev(dsu_pmu);
> 		return rc;
> 	}
>
> 	dev_info(...)

Ok.

>
>> +	return rc;
>> +}
>> +
>> +static int dsu_pmu_device_remove(struct platform_device *pdev)
> The difference in naming style between this and probe is a little
> confusing.
>

Ok

> Why not dsu_pmu_remove?

Because it is callback for the platform device, which should eventually
remove the PMU and any other cleanups. I could rename the probe to match it,
i.e, dsu_pmu_device_probe().


>> +{
>> +	struct dsu_pmu *dsu_pmu = platform_get_drvdata(pdev);
>> +
>> +	dsu_pmu_cleanup_dev(dsu_pmu);
>> +	perf_pmu_unregister(&dsu_pmu->pmu);
> The remove order should be the reverse of probe.
> It just makes it more 'obviously' right and saves reviewer time.
> If there is a reason not to do this, there should be a comment saying
> why.
>

No, you're right. It should be in the reverse order, I will fix it.

>> +
>> +
>> +static int __init dsu_pmu_init(void)
>> +{
>> +	int ret;
>> +
>> +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
>> +					DRVNAME,
>> +					NULL,
>> +					dsu_pmu_cpu_teardown);
>> +	if (ret < 0)
>> +		return ret;
>> +	dsu_pmu_cpuhp_state = ret;
> I'm just curious - what prevents this initialization being done in probe
> rather than init?
>

Because, you need to do that only one per system and rather than one per DSU.
There could be multiple DSUs connected via other links on a bigger platform.


Suzuki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number
  2017-07-24 10:29 ` [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number Suzuki K Poulose
  2017-07-24 13:14   ` Marc Zyngier
@ 2017-07-24 16:42   ` Sudeep Holla
  2017-07-25  9:27     ` Suzuki K Poulose
  1 sibling, 1 reply; 17+ messages in thread
From: Sudeep Holla @ 2017-07-24 16:42 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: Sudeep Holla, mark.rutland, peterz, will.deacon, linux-kernel,
	Rob Herring



On 24/07/17 11:29, Suzuki K Poulose wrote:
> Add a helper to map a device node to a logical CPU number to avoid
> duplication. Currently this is open coded in different places (e.g
> gic-v3, coresight). The helper tries to map device node to a "possible"
> logical CPU id, which may not be online yet. It is the responsibility
> of the user to make sure that the CPU is online. The helper uses
> of_get_cpu_node() which uses arch specific backends to match the phyiscal
> ids.
> 
> Cc: Rob Herring <robh+dt@kernel.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/of/base.c         | 26 ++++++++++++++++++++++++++
>  include/linux/of_device.h |  7 +++++++
>  2 files changed, 33 insertions(+)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 686628d..639af23 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -420,6 +420,32 @@ struct device_node *of_get_cpu_node(int cpu, unsigned int *thread)
>  EXPORT_SYMBOL(of_get_cpu_node);
>  
>  /**
> + * of_device_node_get_cpu: Get the logical CPU number for a given device_node
> + *
> + * @cpu_node: Pointer to the device_node for CPU.
> + *
> + * Returns the logical CPU number of the given CPU device_node.
> + * Returns >= nr_cpu_ids if CPU is not found.
> + */
> +int of_device_node_get_cpu(struct device_node *cpu_node)
> +{
> +	int cpu, thread;
> +	bool found = false;
> +	struct device_node *np;
> +
> +	for_each_possible_cpu(cpu) {
> +		np = of_get_cpu_node(cpu, &thread);

Ideally, we should be able to use of_cpu_device_node_get instead of
of_get_cpu_node which parses the device tree. Not sure if that's the
case here too.

Sorry for following up on this so late. I had a patch in my tree
for-a-while to address that. Just posted it[1] now seeing this patch now.

You can move to of_cpu_device_node_get if all users of this function are
called after CPU's are registered. Otherwise, you can keep it as is for
now and it can be changed once [1] lands.

-- 
Regards,
Sudeep

[1] https://marc.info/?l=devicetree&m=150090452130567&w=2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 6/6] perf: ARM DynamIQ Shared Unit PMU support
  2017-07-24 15:44     ` Suzuki K Poulose
@ 2017-07-25  8:33       ` Jonathan Cameron
  0 siblings, 0 replies; 17+ messages in thread
From: Jonathan Cameron @ 2017-07-25  8:33 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel, mark.rutland, peterz, will.deacon, linux-kernel

On Mon, 24 Jul 2017 16:44:51 +0100
Suzuki K Poulose <Suzuki.Poulose@arm.com> wrote:

> Hi Jonathan,
> 
> 
> On 24/07/17 15:50, Jonathan Cameron wrote:
> > On Mon, 24 Jul 2017 11:29:21 +0100
> > Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> >  
> >> Add support for the Cluster PMU part of the ARM DynamIQ Shared Unit (DSU).
> >> The DSU integrates one or more cores with an L3 memory system, control
> >> logic, and external interfaces to form a multicore cluster. The PMU
> >> allows counting the various events related to L3, SCU etc, along with
> >> providing a cycle counter.
> >>
> >> The PMU can be accessed via system registers, which are common
> >> to the cores in the same cluster. The PMU registers follow the
> >> semantics of the ARMv8 PMU, mostly, with the exception that
> >> the counters record the cluster wide events.
> >>
> >> This driver is mostly based on the ARMv8 and CCI PMU drivers.
> >>
> >> Cc: Mark Rutland <mark.rutland@arm.com>
> >> Cc: Will Deacon <will.deacon@arm.com>
> >> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>  
> > A few quick comments.  
> 
> Thanks for the detailed look. Comments in line. Btw, please could you leave a
> blank line after the quoted text and before your comment (like what I have
> don here) ? That way, it is way may much easier to find your comments.

Sure

> 
> >
> > Jonathan  
> >> ---  
> 
> >>
> >> diff --git a/arch/arm64/include/asm/arm_dsu_pmu.h b/arch/arm64/include/asm/arm_dsu_pmu.h
> >> new file mode 100644
> >> index 0000000..e7276fd
> >> --- /dev/null
> >> +++ b/arch/arm64/include/asm/arm_dsu_pmu.h
> >> @@ -0,0 +1,124 @@  
> > <snip>  
> >> +static inline void __dsu_pmu_counter_interrupt_disable(int counter)
> >> +{
> >> +	write_sysreg_s(BIT(counter), CLUSTERPMINTENCLR_EL1);
> >> +	isb();
> >> +}
> >> +
> >> +
> >> +static inline u32 __dsu_pmu_read_pmceid(int n)
> >> +{
> >> +	switch (n) {
> >> +	case 0:
> >> +		return read_sysreg_s(CLUSTERPMCEID0_EL1);
> >> +	case 1:
> >> +		return read_sysreg_s(CLUSTERPMCEID1_EL1);
> >> +	default:
> >> +		BUILD_BUG();
> >> +		return 0;
> >> +	}
> >> +}  
> > What is the benefit of having this lot in a header?  Is it to support future
> > additional drivers? If not, why not just push them down into the c code.  
> 
> As I mentioned in the cover letter, this is to keep the architecture specific code
> separate so that we could easily add support for this on arm32 kernel.
> 

Fair enough I suppose though I'd personally have done this as
a prepatch to series that adds arm32 support.

> >> --- /dev/null
> >> +++ b/drivers/perf/arm_dsu_pmu.c  
> > <snip>  
> >> +
> >> +/*
> >> + * Make sure the group of events can be scheduled at once
> >> + * on the PMU.
> >> + */
> >> +static int dsu_pmu_validate_group(struct perf_event *event)
> >> +{
> >> +	struct perf_event *sibling, *leader = event->group_leader;
> >> +	struct dsu_hw_events fake_hw;
> >> +
> >> +	if (event->group_leader == event)
> >> +		return 0;
> >> +
> >> +	memset(fake_hw.used_mask, 0, sizeof(fake_hw.used_mask));
> >> +	if (!dsu_pmu_validate_event(event->pmu, &fake_hw, leader))
> >> +		return -EINVAL;
> >> +	list_for_each_entry(sibling, &leader->sibling_list, group_entry) {
> >> +		if (!dsu_pmu_validate_event(event->pmu, &fake_hw, sibling))
> >> +			return -EINVAL;
> >> +	}
> >> +	if (dsu_pmu_validate_event(event->pmu, &fake_hw, event))  
> > Perhaps a comment to say why in this case validate_event has the opposite
> > meaning to the others cases above? (not !dsu_pmu_validate_event())  
> 
> Ah! Thanks for spotting. Thats a mistake. It should be !dsu_pmu_validate_event().
> I will fix it in the next version. We are making sure that the event can be
> scheduled, with the other events in the group already added.
> 
> >> +
> >> +static struct dsu_pmu *dsu_pmu_alloc(struct platform_device *pdev)
> >> +{
> >> +	struct dsu_pmu *dsu_pmu;
> >> +
> >> +	dsu_pmu = devm_kzalloc(&pdev->dev, sizeof(*dsu_pmu), GFP_KERNEL);
> >> +	if (!dsu_pmu)
> >> +		return ERR_PTR(-ENOMEM);  
> > A blank line here would make it a little more readable  
> >> +	raw_spin_lock_init(&dsu_pmu->pmu_lock);  
> > And one here.  
> >> +	return dsu_pmu;  
> 
> It doesn't look that complex here, given it doesn't take the lock.
> If it does help the reading, I could add it.
> 

It was indeed a really minor point!

> >> +}
> >> +
> >> +/**
> >> + * dsu_pmu_dt_get_cpus: Get the list of CPUs in the cluster.
> >> + */
> >> +static int dsu_pmu_dt_get_cpus(struct device_node *dev, cpumask_t *mask)
> >> +{
> >> +	int i = 0, n, cpu;
> >> +	struct device_node *cpu_node;
> >> +
> >> +	n = of_count_phandle_with_args(dev, "cpus", NULL);
> >> +	if (n <= 0)
> >> +		goto out;
> >> +	for (; i < n; i++) {
> >> +		cpu_node = of_parse_phandle(dev, "cpus", i);
> >> +		if (!cpu_node)
> >> +			break;
> >> +		cpu = of_device_node_get_cpu(cpu_node);
> >> +		of_node_put(cpu_node);
> >> +		if (cpu >= nr_cpu_ids)
> >> +			break;  
> > It rather seems like this is an error we would not want to skip over.  
> 
> Ok. That makes sense to me. I can return -EINVAL here.
> 
> >> +		cpumask_set_cpu(cpu, mask);
> >> +	}
> >> +out:
> >> +	return i > 0;  
> > Cleaner to actually return appropriate errors from within
> > this function and pass them all the way up.  
> 
> Sure, will do.
> 
> >> +static int dsu_pmu_probe(struct platform_device *pdev)
> >> +{
> >> +	int irq, rc, cpu;
> >> +	struct dsu_pmu *dsu_pmu;
> >> +	char *name;
> >> +
> >> +	static atomic_t pmu_idx = ATOMIC_INIT(-1);
> >> +
> >> +  
> > One blank line only.  
> 
> Ok.
> 
> >> +	/*
> >> +	 * Find one CPU in the DSU to handle the IRQs.
> >> +	 * It is highly unlikely that we would fail
> >> +	 * to find one, given the probing has succeeded.
> >> +	 */
> >> +	cpu = dsu_pmu_get_online_cpu(dsu_pmu);
> >> +	if (cpu >= nr_cpu_ids)
> >> +		return -ENODEV;
> >> +	cpumask_set_cpu(cpu, &dsu_pmu->active_cpu);
> >> +	rc = irq_set_affinity_hint(irq, &dsu_pmu->active_cpu);
> >> +	if (rc) {
> >> +		dev_warn(&pdev->dev, "Failed to force IRQ affinity for %d\n",
> >> +					 irq);
> >> +		return rc;
> >> +	}  
> 
> > It is a little unusual that you have the above two elements inline
> > here, but have a function to unwind them.  Just makes it a little
> > harder to read and leads to missing things like...  
> 
> The unwinding was added as a function to reuse the code. The "setup" steps
> undone by the unwind doesn't look separate from what we do in the probe,
> hence didn't go for a separate function.
> 
> >> +
> >> +	platform_set_drvdata(pdev, dsu_pmu);
> >> +	rc = cpuhp_state_add_instance(dsu_pmu_cpuhp_state,
> >> +						&dsu_pmu->cpuhp_node);
> >> +	if (rc)  
> > I believe irq_set_affinity_hit(dsu_pmu->irq, NULL) would make sense
> > here.  
> 
> Yes, you're right. Otherwise we could hit a WARN_ON. I will rearrange
> the probe code to a cleaner state.
>

Cool
 
> >> +		return rc;
> >> +
> >> +	dsu_pmu->irq = irq;
> >> +	dsu_pmu->pmu = (struct pmu) {
> >> +		.task_ctx_nr	= perf_invalid_context,
> >> +
> >> +		.pmu_enable	= dsu_pmu_enable,
> >> +		.pmu_disable	= dsu_pmu_disable,
> >> +		.event_init	= dsu_pmu_event_init,
> >> +		.add		= dsu_pmu_add,
> >> +		.del		= dsu_pmu_del,
> >> +		.start		= dsu_pmu_start,
> >> +		.stop		= dsu_pmu_stop,
> >> +		.read		= dsu_pmu_read,
> >> +
> >> +		.attr_groups	= dsu_pmu_attr_groups,
> >> +	};
> >> +
> >> +	rc = perf_pmu_register(&dsu_pmu->pmu, name, -1);
> >> +
> >> +	if (!rc)
> >> +		dev_info(&pdev->dev, "Registered %s with %d event counters",
> >> +				name, dsu_pmu->num_counters);
> >> +	else
> >> +		dsu_pmu_cleanup_dev(dsu_pmu);  
> > It is cleaner to have the error handled as the 'exceptional'
> > element.  Slightly more code, but easier to read.
> > i.e.
> >
> > 	if (rc) {
> > 		dsu_pmu_cleanup_dev(dsu_pmu);
> > 		return rc;
> > 	}
> >
> > 	dev_info(...)  
> 
> Ok.
> 
> >  
> >> +	return rc;
> >> +}
> >> +
> >> +static int dsu_pmu_device_remove(struct platform_device *pdev)  
> > The difference in naming style between this and probe is a little
> > confusing.
> >  
> 
> Ok
> 
> > Why not dsu_pmu_remove?  
> 
> Because it is callback for the platform device, which should eventually
> remove the PMU and any other cleanups. I could rename the probe to match it,
> i.e, dsu_pmu_device_probe().
> 

Just as good.

> 
> >> +{
> >> +	struct dsu_pmu *dsu_pmu = platform_get_drvdata(pdev);
> >> +
> >> +	dsu_pmu_cleanup_dev(dsu_pmu);
> >> +	perf_pmu_unregister(&dsu_pmu->pmu);  
> > The remove order should be the reverse of probe.
> > It just makes it more 'obviously' right and saves reviewer time.
> > If there is a reason not to do this, there should be a comment saying
> > why.
> >  
> 
> No, you're right. It should be in the reverse order, I will fix it.
> 
> >> +
> >> +
> >> +static int __init dsu_pmu_init(void)
> >> +{
> >> +	int ret;
> >> +
> >> +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
> >> +					DRVNAME,
> >> +					NULL,
> >> +					dsu_pmu_cpu_teardown);
> >> +	if (ret < 0)
> >> +		return ret;
> >> +	dsu_pmu_cpuhp_state = ret;  
> > I'm just curious - what prevents this initialization being done in probe
> > rather than init?
> >  
> 
> Because, you need to do that only one per system and rather than one per DSU.
> There could be multiple DSUs connected via other links on a bigger platform.
> 
That makes sense.  Thanks for the explanation.

Jonathan
> 
> Suzuki

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number
  2017-07-24 16:42   ` Sudeep Holla
@ 2017-07-25  9:27     ` Suzuki K Poulose
  2017-07-25  9:49       ` Sudeep Holla
  0 siblings, 1 reply; 17+ messages in thread
From: Suzuki K Poulose @ 2017-07-25  9:27 UTC (permalink / raw)
  To: Sudeep Holla, linux-arm-kernel
  Cc: mark.rutland, peterz, will.deacon, linux-kernel, Rob Herring

On 24/07/17 17:42, Sudeep Holla wrote:
>
>
> On 24/07/17 11:29, Suzuki K Poulose wrote:
>> Add a helper to map a device node to a logical CPU number to avoid
>> duplication. Currently this is open coded in different places (e.g
>> gic-v3, coresight). The helper tries to map device node to a "possible"
>> logical CPU id, which may not be online yet. It is the responsibility
>> of the user to make sure that the CPU is online. The helper uses
>> of_get_cpu_node() which uses arch specific backends to match the phyiscal
>> ids.
>>
>> Cc: Rob Herring <robh+dt@kernel.org>
>> Cc: Mark Rutland <mark.rutland@arm.com>
>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>> ---
>>  drivers/of/base.c         | 26 ++++++++++++++++++++++++++
>>  include/linux/of_device.h |  7 +++++++
>>  2 files changed, 33 insertions(+)
>>
>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>> index 686628d..639af23 100644
>> --- a/drivers/of/base.c
>> +++ b/drivers/of/base.c
>> @@ -420,6 +420,32 @@ struct device_node *of_get_cpu_node(int cpu, unsigned int *thread)
>>  EXPORT_SYMBOL(of_get_cpu_node);
>>
>>  /**
>> + * of_device_node_get_cpu: Get the logical CPU number for a given device_node
>> + *
>> + * @cpu_node: Pointer to the device_node for CPU.
>> + *
>> + * Returns the logical CPU number of the given CPU device_node.
>> + * Returns >= nr_cpu_ids if CPU is not found.
>> + */
>> +int of_device_node_get_cpu(struct device_node *cpu_node)
>> +{
>> +	int cpu, thread;
>> +	bool found = false;
>> +	struct device_node *np;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		np = of_get_cpu_node(cpu, &thread);
>
> Ideally, we should be able to use of_cpu_device_node_get instead of
> of_get_cpu_node which parses the device tree. Not sure if that's the
> case here too.
>
> Sorry for following up on this so late. I had a patch in my tree
> for-a-while to address that. Just posted it[1] now seeing this patch now.
>
> You can move to of_cpu_device_node_get if all users of this function are
> called after CPU's are registered. Otherwise, you can keep it as is for
> now and it can be changed once [1] lands.

As you mentioned offline, since GIC uses this, we need to stick to of_get_cpu_node.
In fact, I started off with the of_cpu_device_node_get() and switched to the former.

It would be really nice to have your patch, which streamlines the interface and we
could switch to that.

Cheers
Suzuki

>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number
  2017-07-25  9:27     ` Suzuki K Poulose
@ 2017-07-25  9:49       ` Sudeep Holla
  0 siblings, 0 replies; 17+ messages in thread
From: Sudeep Holla @ 2017-07-25  9:49 UTC (permalink / raw)
  To: Suzuki K Poulose, linux-arm-kernel
  Cc: Sudeep Holla, mark.rutland, peterz, will.deacon, linux-kernel,
	Rob Herring



On 25/07/17 10:27, Suzuki K Poulose wrote:
> On 24/07/17 17:42, Sudeep Holla wrote:
>>
>>
>> On 24/07/17 11:29, Suzuki K Poulose wrote:
>>> Add a helper to map a device node to a logical CPU number to avoid
>>> duplication. Currently this is open coded in different places (e.g
>>> gic-v3, coresight). The helper tries to map device node to a "possible"
>>> logical CPU id, which may not be online yet. It is the responsibility
>>> of the user to make sure that the CPU is online. The helper uses
>>> of_get_cpu_node() which uses arch specific backends to match the
>>> phyiscal
>>> ids.
>>>
>>> Cc: Rob Herring <robh+dt@kernel.org>
>>> Cc: Mark Rutland <mark.rutland@arm.com>
>>> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
>>> ---
>>>  drivers/of/base.c         | 26 ++++++++++++++++++++++++++
>>>  include/linux/of_device.h |  7 +++++++
>>>  2 files changed, 33 insertions(+)
>>>
>>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>>> index 686628d..639af23 100644
>>> --- a/drivers/of/base.c
>>> +++ b/drivers/of/base.c
>>> @@ -420,6 +420,32 @@ struct device_node *of_get_cpu_node(int cpu,
>>> unsigned int *thread)
>>>  EXPORT_SYMBOL(of_get_cpu_node);
>>>
>>>  /**
>>> + * of_device_node_get_cpu: Get the logical CPU number for a given
>>> device_node
>>> + *
>>> + * @cpu_node: Pointer to the device_node for CPU.
>>> + *
>>> + * Returns the logical CPU number of the given CPU device_node.
>>> + * Returns >= nr_cpu_ids if CPU is not found.
>>> + */
>>> +int of_device_node_get_cpu(struct device_node *cpu_node)
>>> +{
>>> +    int cpu, thread;
>>> +    bool found = false;
>>> +    struct device_node *np;
>>> +
>>> +    for_each_possible_cpu(cpu) {
>>> +        np = of_get_cpu_node(cpu, &thread);
>>
>> Ideally, we should be able to use of_cpu_device_node_get instead of
>> of_get_cpu_node which parses the device tree. Not sure if that's the
>> case here too.
>>
>> Sorry for following up on this so late. I had a patch in my tree
>> for-a-while to address that. Just posted it[1] now seeing this patch now.
>>
>> You can move to of_cpu_device_node_get if all users of this function are
>> called after CPU's are registered. Otherwise, you can keep it as is for
>> now and it can be changed once [1] lands.
> 
> As you mentioned offline, since GIC uses this, we need to stick to 
> of_get_cpu_node.

Agreed.

> In fact, I started off with the of_cpu_device_node_get() and switched
> to the former.
> 

Ah OK.

> It would be really nice to have your patch, which streamlines the 
> interface and we could switch to that.
> 

Yes, I will wait for that patch to land in the mainline, I can follow up
and remove the use of of_get_cpu_node wherever required.
-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v2 3/6] coresight: of: Use of_device_node_get_cpu helper
  2017-07-24 10:29 ` [PATCH v2 3/6] coresight: of: Use of_device_node_get_cpu helper Suzuki K Poulose
@ 2017-07-25 16:14   ` Mathieu Poirier
  0 siblings, 0 replies; 17+ messages in thread
From: Mathieu Poirier @ 2017-07-25 16:14 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, Will Deacon, Mark Rutland,
	Peter Zijlstra, Leo Yan

On 24 July 2017 at 04:29, Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> Reuse the new generic helper, of_device_node_get_cpu() to map a
> given CPU phandle to a logical CPU number.
>
> Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
> Cc: Leo Yan <leo.yan@linaro.org>
> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> ---
>  drivers/hwtracing/coresight/of_coresight.c | 20 ++++++--------------
>  1 file changed, 6 insertions(+), 14 deletions(-)
>
> diff --git a/drivers/hwtracing/coresight/of_coresight.c b/drivers/hwtracing/coresight/of_coresight.c
> index a187941..42ce9f8 100644
> --- a/drivers/hwtracing/coresight/of_coresight.c
> +++ b/drivers/hwtracing/coresight/of_coresight.c
> @@ -16,6 +16,7 @@
>  #include <linux/clk.h>
>  #include <linux/of.h>
>  #include <linux/of_address.h>
> +#include <linux/of_device.h>
>  #include <linux/of_graph.h>
>  #include <linux/of_platform.h>
>  #include <linux/platform_device.h>
> @@ -104,26 +105,17 @@ static int of_coresight_alloc_memory(struct device *dev,
>  int of_coresight_get_cpu(const struct device_node *node)
>  {
>         int cpu;
> -       bool found;
> -       struct device_node *dn, *np;
> +       struct device_node *dn;
>
>         dn = of_parse_phandle(node, "cpu", 0);
> -
>         /* Affinity defaults to CPU0 */
>         if (!dn)
>                 return 0;
> -
> -       for_each_possible_cpu(cpu) {
> -               np = of_cpu_device_node_get(cpu);
> -               found = (dn == np);
> -               of_node_put(np);
> -               if (found)
> -                       break;
> -       }
> -       of_node_put(dn);
> -
> +       cpu = of_device_node_get_cpu(dn);
>         /* Affinity to CPU0 if no cpu nodes are found */
> -       return found ? cpu : 0;
> +       if (cpu >= nr_cpu_ids)
> +               return 0;
> +       return cpu;
>  }
>  EXPORT_SYMBOL_GPL(of_coresight_get_cpu);
>

Acked-by: Mathieu Poirier <mathieu.poirier@linaro,org>

> --
> 2.7.5
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2017-07-25 16:14 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-24 10:29 [PATCH v2 0/6] perf: Support for ARM DynamIQ Shared Unit PMU Suzuki K Poulose
2017-07-24 10:29 ` [PATCH v2 1/6] perf: Export perf_event_update_userpage Suzuki K Poulose
2017-07-24 10:29 ` [PATCH v2 2/6] of: Add helper for mapping device node to logical CPU number Suzuki K Poulose
2017-07-24 13:14   ` Marc Zyngier
2017-07-24 15:15     ` Suzuki K Poulose
2017-07-24 16:42   ` Sudeep Holla
2017-07-25  9:27     ` Suzuki K Poulose
2017-07-25  9:49       ` Sudeep Holla
2017-07-24 10:29 ` [PATCH v2 3/6] coresight: of: Use of_device_node_get_cpu helper Suzuki K Poulose
2017-07-25 16:14   ` Mathieu Poirier
2017-07-24 10:29 ` [PATCH v2 4/6] irqchip: gic-v3: " Suzuki K Poulose
2017-07-24 13:15   ` Marc Zyngier
2017-07-24 10:29 ` [PATCH v2 5/6] dt-bindings: Document devicetree binding for ARM DSU PMU Suzuki K Poulose
2017-07-24 10:29 ` [PATCH v2 6/6] perf: ARM DynamIQ Shared Unit PMU support Suzuki K Poulose
2017-07-24 14:50   ` Jonathan Cameron
2017-07-24 15:44     ` Suzuki K Poulose
2017-07-25  8:33       ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox