linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/3] perf/arm-ni: IRQ improvements
@ 2025-05-13 15:38 Robin Murphy
  2025-05-13 15:38 ` [PATCH 1/3] perf/arm-ni: Set initial IRQ affinity Robin Murphy
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Robin Murphy @ 2025-05-13 15:38 UTC (permalink / raw)
  To: will
  Cc: mark.rutland, linux-arm-kernel, linux-perf-users, allen.wang,
	peter.du, andy.xu

Hi all,

While the previous attempt[1] was missing some important aspects, it's
a valid problem to solve, so I figured I'd take a look myself. This
ended up turning into a slightly wider cleanup, but I think the end
result is nice enough.

As usual I don't have anything to test this on, so if there are any
silly mistakes left that the compiler couldn't catch please do let me
know.

Thanks,
Robin.

[1] https://lore.kernel.org/all/20250410114214.1599777-3-allen.wang@hj-micro.com/


Robin Murphy (2):
  perf/arm-ni: Set initial IRQ affinity
  perf/arm-ni: Consolidate CPU affinity handling

Shouping Wang (1):
  perf/arm-ni: Support sharing IRQs within an NI instance

 drivers/perf/arm-ni.c | 148 ++++++++++++++++++++++++------------------
 1 file changed, 84 insertions(+), 64 deletions(-)

-- 
2.39.2.101.g768bb238c484.dirty


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/3] perf/arm-ni: Set initial IRQ affinity
  2025-05-13 15:38 [PATCH 0/3] perf/arm-ni: IRQ improvements Robin Murphy
@ 2025-05-13 15:38 ` Robin Murphy
  2025-05-13 15:38 ` [PATCH 2/3] perf/arm-ni: Consolidate CPU affinity handling Robin Murphy
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Robin Murphy @ 2025-05-13 15:38 UTC (permalink / raw)
  To: will
  Cc: mark.rutland, linux-arm-kernel, linux-perf-users, allen.wang,
	peter.du, andy.xu, stable

While we do request our IRQs with the right flags to stop their affinity
changing unexpectedly, we forgot to actually set it to start with. Oops.

Cc: stable@vger.kernel.org
Fixes: 4d5a7680f2b4 ("perf: Add driver for Arm NI-700 interconnect PMU")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/perf/arm-ni.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/perf/arm-ni.c b/drivers/perf/arm-ni.c
index de7b6cce4d68..9396d243415f 100644
--- a/drivers/perf/arm-ni.c
+++ b/drivers/perf/arm-ni.c
@@ -544,6 +544,8 @@ static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_s
 		return err;
 
 	cd->cpu = cpumask_local_spread(0, dev_to_node(ni->dev));
+	irq_set_affinity(cd->irq, cpumask_of(cd->cpu));
+
 	cd->pmu = (struct pmu) {
 		.module = THIS_MODULE,
 		.parent = ni->dev,
-- 
2.39.2.101.g768bb238c484.dirty


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/3] perf/arm-ni: Consolidate CPU affinity handling
  2025-05-13 15:38 [PATCH 0/3] perf/arm-ni: IRQ improvements Robin Murphy
  2025-05-13 15:38 ` [PATCH 1/3] perf/arm-ni: Set initial IRQ affinity Robin Murphy
@ 2025-05-13 15:38 ` Robin Murphy
  2025-05-13 15:39 ` [PATCH 3/3] perf/arm-ni: Support sharing IRQs within an NI instance Robin Murphy
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Robin Murphy @ 2025-05-13 15:38 UTC (permalink / raw)
  To: will
  Cc: mark.rutland, linux-arm-kernel, linux-perf-users, allen.wang,
	peter.du, andy.xu

Since overflow interrupts from the individual PMUs are infrequent and
unlikely to coincide, and we make no attempt to balance them across
CPUs anyway, there's really not much point tracking a separate CPU
affinity per PMU. Move the CPU affinity and hotplug migration up to
the NI instance level.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/perf/arm-ni.c | 74 ++++++++++++++++++++-----------------------
 1 file changed, 34 insertions(+), 40 deletions(-)

diff --git a/drivers/perf/arm-ni.c b/drivers/perf/arm-ni.c
index 9396d243415f..168750e78fc4 100644
--- a/drivers/perf/arm-ni.c
+++ b/drivers/perf/arm-ni.c
@@ -104,8 +104,6 @@ struct arm_ni_cd {
 	u16 id;
 	int num_units;
 	int irq;
-	int cpu;
-	struct hlist_node cpuhp_node;
 	struct pmu pmu;
 	struct arm_ni_unit *units;
 	struct perf_event *evcnt[NI_NUM_COUNTERS];
@@ -117,13 +115,18 @@ struct arm_ni {
 	void __iomem *base;
 	enum ni_part part;
 	int id;
+	int cpu;
 	int num_cds;
+	struct hlist_node cpuhp_node;
 	struct arm_ni_cd cds[] __counted_by(num_cds);
 };
 
 #define cd_to_ni(cd) container_of((cd), struct arm_ni, cds[(cd)->id])
 #define pmu_to_cd(p) container_of((p), struct arm_ni_cd, pmu)
 
+#define ni_for_each_cd(n, c) \
+	for (struct arm_ni_cd *c = n->cds; c < n->cds + n->num_cds; c++) if (c->pmu_base)
+
 #define cd_for_each_unit(cd, u) \
 	for (struct arm_ni_unit *u = cd->units; u < cd->units + cd->num_units; u++)
 
@@ -218,9 +221,9 @@ static const struct attribute_group arm_ni_format_attrs_group = {
 static ssize_t arm_ni_cpumask_show(struct device *dev,
 				   struct device_attribute *attr, char *buf)
 {
-	struct arm_ni_cd *cd = pmu_to_cd(dev_get_drvdata(dev));
+	struct arm_ni *ni = cd_to_ni(pmu_to_cd(dev_get_drvdata(dev)));
 
-	return cpumap_print_to_pagebuf(true, buf, cpumask_of(cd->cpu));
+	return cpumap_print_to_pagebuf(true, buf, cpumask_of(ni->cpu));
 }
 
 static struct device_attribute arm_ni_cpumask_attr =
@@ -314,7 +317,7 @@ static int arm_ni_event_init(struct perf_event *event)
 	if (is_sampling_event(event))
 		return -EINVAL;
 
-	event->cpu = cd->cpu;
+	event->cpu = cd_to_ni(cd)->cpu;
 	if (NI_EVENT_TYPE(event) == NI_PMU)
 		return arm_ni_validate_group(event);
 
@@ -543,8 +546,7 @@ static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_s
 	if (err)
 		return err;
 
-	cd->cpu = cpumask_local_spread(0, dev_to_node(ni->dev));
-	irq_set_affinity(cd->irq, cpumask_of(cd->cpu));
+	irq_set_affinity(cd->irq, cpumask_of(ni->cpu));
 
 	cd->pmu = (struct pmu) {
 		.module = THIS_MODULE,
@@ -566,32 +568,19 @@ static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_s
 	if (!name)
 		return -ENOMEM;
 
-	err = cpuhp_state_add_instance_nocalls(arm_ni_hp_state, &cd->cpuhp_node);
-	if (err)
-		return err;
-
-	err = perf_pmu_register(&cd->pmu, name, -1);
-	if (err)
-		cpuhp_state_remove_instance_nocalls(arm_ni_hp_state, &cd->cpuhp_node);
-
-	return err;
+	return perf_pmu_register(&cd->pmu, name, -1);
 }
 
 static void arm_ni_remove(struct platform_device *pdev)
 {
 	struct arm_ni *ni = platform_get_drvdata(pdev);
 
-	for (int i = 0; i < ni->num_cds; i++) {
-		struct arm_ni_cd *cd = ni->cds + i;
-
-		if (!cd->pmu_base)
-			continue;
-
+	ni_for_each_cd(ni, cd) {
 		writel_relaxed(0, cd->pmu_base + NI_PMCR);
 		writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENCLR);
 		perf_pmu_unregister(&cd->pmu);
-		cpuhp_state_remove_instance_nocalls(arm_ni_hp_state, &cd->cpuhp_node);
 	}
+	cpuhp_state_remove_instance_nocalls(arm_ni_hp_state, &ni->cpuhp_node);
 }
 
 static void arm_ni_probe_domain(void __iomem *base, struct arm_ni_node *node)
@@ -611,7 +600,7 @@ static int arm_ni_probe(struct platform_device *pdev)
 	struct resource *res;
 	void __iomem *base;
 	static atomic_t id;
-	int num_cds;
+	int ret, num_cds;
 	u32 reg, part;
 
 	/*
@@ -662,8 +651,13 @@ static int arm_ni_probe(struct platform_device *pdev)
 	ni->num_cds = num_cds;
 	ni->part = part;
 	ni->id = atomic_fetch_inc(&id);
+	ni->cpu = cpumask_local_spread(0, dev_to_node(ni->dev));
 	platform_set_drvdata(pdev, ni);
 
+	ret = cpuhp_state_add_instance_nocalls(arm_ni_hp_state, &ni->cpuhp_node);
+	if (ret)
+		return ret;
+
 	for (int v = 0; v < cfg.num_components; v++) {
 		reg = readl_relaxed(cfg.base + NI_CHILD_PTR(v));
 		arm_ni_probe_domain(base + reg, &vd);
@@ -671,8 +665,6 @@ static int arm_ni_probe(struct platform_device *pdev)
 			reg = readl_relaxed(vd.base + NI_CHILD_PTR(p));
 			arm_ni_probe_domain(base + reg, &pd);
 			for (int c = 0; c < pd.num_components; c++) {
-				int ret;
-
 				reg = readl_relaxed(pd.base + NI_CHILD_PTR(c));
 				arm_ni_probe_domain(base + reg, &cd);
 				ret = arm_ni_init_cd(ni, &cd, res->start);
@@ -714,42 +706,44 @@ static struct platform_driver arm_ni_driver = {
 	.remove = arm_ni_remove,
 };
 
-static void arm_ni_pmu_migrate(struct arm_ni_cd *cd, unsigned int cpu)
+static void arm_ni_pmu_migrate(struct arm_ni *ni, unsigned int cpu)
 {
-	perf_pmu_migrate_context(&cd->pmu, cd->cpu, cpu);
-	irq_set_affinity(cd->irq, cpumask_of(cpu));
-	cd->cpu = cpu;
+	ni_for_each_cd(ni, cd) {
+		perf_pmu_migrate_context(&cd->pmu, ni->cpu, cpu);
+		irq_set_affinity(cd->irq, cpumask_of(cpu));
+	}
+	ni->cpu = cpu;
 }
 
 static int arm_ni_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
 {
-	struct arm_ni_cd *cd;
+	struct arm_ni *ni;
 	int node;
 
-	cd = hlist_entry_safe(cpuhp_node, struct arm_ni_cd, cpuhp_node);
-	node = dev_to_node(cd_to_ni(cd)->dev);
-	if (cpu_to_node(cd->cpu) != node && cpu_to_node(cpu) == node)
-		arm_ni_pmu_migrate(cd, cpu);
+	ni = hlist_entry_safe(cpuhp_node, struct arm_ni, cpuhp_node);
+	node = dev_to_node(ni->dev);
+	if (cpu_to_node(ni->cpu) != node && cpu_to_node(cpu) == node)
+		arm_ni_pmu_migrate(ni, cpu);
 	return 0;
 }
 
 static int arm_ni_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
 {
-	struct arm_ni_cd *cd;
+	struct arm_ni *ni;
 	unsigned int target;
 	int node;
 
-	cd = hlist_entry_safe(cpuhp_node, struct arm_ni_cd, cpuhp_node);
-	if (cpu != cd->cpu)
+	ni = hlist_entry_safe(cpuhp_node, struct arm_ni, cpuhp_node);
+	if (cpu != ni->cpu)
 		return 0;
 
-	node = dev_to_node(cd_to_ni(cd)->dev);
+	node = dev_to_node(ni->dev);
 	target = cpumask_any_and_but(cpumask_of_node(node), cpu_online_mask, cpu);
 	if (target >= nr_cpu_ids)
 		target = cpumask_any_but(cpu_online_mask, cpu);
 
 	if (target < nr_cpu_ids)
-		arm_ni_pmu_migrate(cd, target);
+		arm_ni_pmu_migrate(ni, target);
 	return 0;
 }
 
-- 
2.39.2.101.g768bb238c484.dirty


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/3] perf/arm-ni: Support sharing IRQs within an NI instance
  2025-05-13 15:38 [PATCH 0/3] perf/arm-ni: IRQ improvements Robin Murphy
  2025-05-13 15:38 ` [PATCH 1/3] perf/arm-ni: Set initial IRQ affinity Robin Murphy
  2025-05-13 15:38 ` [PATCH 2/3] perf/arm-ni: Consolidate CPU affinity handling Robin Murphy
@ 2025-05-13 15:39 ` Robin Murphy
  2025-05-19 12:29   ` Will Deacon
  2025-05-16 11:46 ` [PATCH 0/3] perf/arm-ni: IRQ improvements Shouping Wang
  2025-07-04 17:44 ` Will Deacon
  4 siblings, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2025-05-13 15:39 UTC (permalink / raw)
  To: will
  Cc: mark.rutland, linux-arm-kernel, linux-perf-users, allen.wang,
	peter.du, andy.xu

From: Shouping Wang <allen.wang@hj-micro.com>

NI-700 has a distinct PMU interrupt output for each Clock Domain,
however some integrations may still combine these together externally.
The initial driver didn't attempt to support this, in anticipation of a
more general solution for IRQ sharing between system PMU instances, but
that's still a way off, so let's make this intermediate step for now to
at least allow sharing IRQs within an individual NI instance.

Now that CPU affinity and migration are cleaned up, it's fairly
straightforward to adopt similar logic to arm-cmn, to identify CDs with
a common interrupt and loop over them directly in the handler.

Signed-off-by: Shouping Wang <allen.wang@hj-micro.com>
[ rm: Rework for affinity handling, cosmetics, new commit message ]
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
---
 drivers/perf/arm-ni.c | 78 ++++++++++++++++++++++++++++---------------
 1 file changed, 51 insertions(+), 27 deletions(-)

diff --git a/drivers/perf/arm-ni.c b/drivers/perf/arm-ni.c
index 168750e78fc4..ca033c8785ff 100644
--- a/drivers/perf/arm-ni.c
+++ b/drivers/perf/arm-ni.c
@@ -102,6 +102,7 @@ struct arm_ni_unit {
 struct arm_ni_cd {
 	void __iomem *pmu_base;
 	u16 id;
+	s8 irq_friend;
 	int num_units;
 	int irq;
 	struct pmu pmu;
@@ -448,33 +449,37 @@ static irqreturn_t arm_ni_handle_irq(int irq, void *dev_id)
 {
 	struct arm_ni_cd *cd = dev_id;
 	irqreturn_t ret = IRQ_NONE;
-	u32 reg = readl_relaxed(cd->pmu_base + NI_PMOVSCLR);
 
-	if (reg & (1U << NI_CCNT_IDX)) {
-		ret = IRQ_HANDLED;
-		if (!(WARN_ON(!cd->ccnt))) {
-			arm_ni_event_read(cd->ccnt);
-			arm_ni_init_ccnt(cd);
+	for (;;) {
+		u32 reg = readl_relaxed(cd->pmu_base + NI_PMOVSCLR);
+
+		if (reg & (1U << NI_CCNT_IDX)) {
+			ret = IRQ_HANDLED;
+			if (!(WARN_ON(!cd->ccnt))) {
+				arm_ni_event_read(cd->ccnt);
+				arm_ni_init_ccnt(cd);
+			}
 		}
-	}
-	for (int i = 0; i < NI_NUM_COUNTERS; i++) {
-		if (!(reg & (1U << i)))
-			continue;
-		ret = IRQ_HANDLED;
-		if (!(WARN_ON(!cd->evcnt[i]))) {
-			arm_ni_event_read(cd->evcnt[i]);
-			arm_ni_init_evcnt(cd, i);
+		for (int i = 0; i < NI_NUM_COUNTERS; i++) {
+			if (!(reg & (1U << i)))
+				continue;
+			ret = IRQ_HANDLED;
+			if (!(WARN_ON(!cd->evcnt[i]))) {
+				arm_ni_event_read(cd->evcnt[i]);
+				arm_ni_init_evcnt(cd, i);
+			}
 		}
+		writel_relaxed(reg, cd->pmu_base + NI_PMOVSCLR);
+		if (!cd->irq_friend)
+			return ret;
+		cd += cd->irq_friend;
 	}
-	writel_relaxed(reg, cd->pmu_base + NI_PMOVSCLR);
-	return ret;
 }
 
 static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_start)
 {
 	struct arm_ni_cd *cd = ni->cds + node->id;
 	const char *name;
-	int err;
 
 	cd->id = node->id;
 	cd->num_units = node->num_components;
@@ -534,20 +539,11 @@ static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_s
 		       cd->pmu_base + NI_PMCR);
 	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMCNTENCLR);
 	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMOVSCLR);
-	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENSET);
 
 	cd->irq = platform_get_irq(to_platform_device(ni->dev), cd->id);
 	if (cd->irq < 0)
 		return cd->irq;
 
-	err = devm_request_irq(ni->dev, cd->irq, arm_ni_handle_irq,
-			       IRQF_NOBALANCING | IRQF_NO_THREAD,
-			       dev_name(ni->dev), cd);
-	if (err)
-		return err;
-
-	irq_set_affinity(cd->irq, cpumask_of(ni->cpu));
-
 	cd->pmu = (struct pmu) {
 		.module = THIS_MODULE,
 		.parent = ni->dev,
@@ -593,6 +589,30 @@ static void arm_ni_probe_domain(void __iomem *base, struct arm_ni_node *node)
 	node->num_components = readl_relaxed(base + NI_CHILD_NODE_INFO);
 }
 
+static int arm_ni_init_irqs(struct arm_ni *ni)
+{
+	int err;
+
+	ni_for_each_cd(ni, cd) {
+		for (struct arm_ni_cd *prev = cd; prev-- > ni->cds; ) {
+			if (prev->irq == cd->irq) {
+				prev->irq_friend = cd - prev;
+				goto enable_irq;
+			}
+		}
+		err = devm_request_irq(ni->dev, cd->irq, arm_ni_handle_irq,
+				       IRQF_NOBALANCING | IRQF_NO_THREAD,
+				       dev_name(ni->dev), cd);
+		if (err)
+			return err;
+
+		irq_set_affinity(cd->irq, cpumask_of(ni->cpu));
+enable_irq:
+		writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENSET);
+	}
+	return 0;
+}
+
 static int arm_ni_probe(struct platform_device *pdev)
 {
 	struct arm_ni_node cfg, vd, pd, cd;
@@ -677,7 +697,11 @@ static int arm_ni_probe(struct platform_device *pdev)
 		}
 	}
 
-	return 0;
+	ret = arm_ni_init_irqs(ni);
+	if (ret)
+		arm_ni_remove(pdev);
+
+	return ret;
 }
 
 #ifdef CONFIG_OF
-- 
2.39.2.101.g768bb238c484.dirty


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] perf/arm-ni: IRQ improvements
  2025-05-13 15:38 [PATCH 0/3] perf/arm-ni: IRQ improvements Robin Murphy
                   ` (2 preceding siblings ...)
  2025-05-13 15:39 ` [PATCH 3/3] perf/arm-ni: Support sharing IRQs within an NI instance Robin Murphy
@ 2025-05-16 11:46 ` Shouping Wang
  2025-07-04 17:44 ` Will Deacon
  4 siblings, 0 replies; 9+ messages in thread
From: Shouping Wang @ 2025-05-16 11:46 UTC (permalink / raw)
  To: Robin Murphy, will
  Cc: mark.rutland, linux-arm-kernel, linux-perf-users, peter.du,
	andy.xu, allen.wang

Tested this patch set in the scenario of PMU shared interruption within
an individual NI instance, and no issues were found.

Tested-by: Shouping Wang <allen.wang@hj-micro.com>

On 5/13/2025 11:38 PM, Robin Murphy wrote:
> Hi all,
> 
> While the previous attempt[1] was missing some important aspects, it's
> a valid problem to solve, so I figured I'd take a look myself. This
> ended up turning into a slightly wider cleanup, but I think the end
> result is nice enough.
> 
> As usual I don't have anything to test this on, so if there are any
> silly mistakes left that the compiler couldn't catch please do let me
> know.
> 
> Thanks,
> Robin.
> 
> [1] https://lore.kernel.org/all/20250410114214.1599777-3-allen.wang@hj-micro.com/
> 
> 
> Robin Murphy (2):
>   perf/arm-ni: Set initial IRQ affinity
>   perf/arm-ni: Consolidate CPU affinity handling
> 
> Shouping Wang (1):
>   perf/arm-ni: Support sharing IRQs within an NI instance
> 
>  drivers/perf/arm-ni.c | 148 ++++++++++++++++++++++++------------------
>  1 file changed, 84 insertions(+), 64 deletions(-)
> 


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] perf/arm-ni: Support sharing IRQs within an NI instance
  2025-05-13 15:39 ` [PATCH 3/3] perf/arm-ni: Support sharing IRQs within an NI instance Robin Murphy
@ 2025-05-19 12:29   ` Will Deacon
  2025-05-19 14:57     ` Robin Murphy
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2025-05-19 12:29 UTC (permalink / raw)
  To: Robin Murphy
  Cc: mark.rutland, linux-arm-kernel, linux-perf-users, allen.wang,
	peter.du, andy.xu

On Tue, May 13, 2025 at 04:39:00PM +0100, Robin Murphy wrote:
> From: Shouping Wang <allen.wang@hj-micro.com>
> 
> NI-700 has a distinct PMU interrupt output for each Clock Domain,
> however some integrations may still combine these together externally.
> The initial driver didn't attempt to support this, in anticipation of a
> more general solution for IRQ sharing between system PMU instances, but
> that's still a way off, so let's make this intermediate step for now to
> at least allow sharing IRQs within an individual NI instance.
> 
> Now that CPU affinity and migration are cleaned up, it's fairly
> straightforward to adopt similar logic to arm-cmn, to identify CDs with
> a common interrupt and loop over them directly in the handler.
> 
> Signed-off-by: Shouping Wang <allen.wang@hj-micro.com>
> [ rm: Rework for affinity handling, cosmetics, new commit message ]
> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> ---
>  drivers/perf/arm-ni.c | 78 ++++++++++++++++++++++++++++---------------
>  1 file changed, 51 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/perf/arm-ni.c b/drivers/perf/arm-ni.c
> index 168750e78fc4..ca033c8785ff 100644
> --- a/drivers/perf/arm-ni.c
> +++ b/drivers/perf/arm-ni.c
> @@ -102,6 +102,7 @@ struct arm_ni_unit {
>  struct arm_ni_cd {
>  	void __iomem *pmu_base;
>  	u16 id;
> +	s8 irq_friend;

Why not just store the pointer to the cd as opposed to a relative offset?

>  	int num_units;
>  	int irq;
>  	struct pmu pmu;
> @@ -448,33 +449,37 @@ static irqreturn_t arm_ni_handle_irq(int irq, void *dev_id)
>  {
>  	struct arm_ni_cd *cd = dev_id;
>  	irqreturn_t ret = IRQ_NONE;
> -	u32 reg = readl_relaxed(cd->pmu_base + NI_PMOVSCLR);
>  
> -	if (reg & (1U << NI_CCNT_IDX)) {
> -		ret = IRQ_HANDLED;
> -		if (!(WARN_ON(!cd->ccnt))) {
> -			arm_ni_event_read(cd->ccnt);
> -			arm_ni_init_ccnt(cd);
> +	for (;;) {

Could this be a do { ... } while (cd) loop instead?

> +		u32 reg = readl_relaxed(cd->pmu_base + NI_PMOVSCLR);
> +
> +		if (reg & (1U << NI_CCNT_IDX)) {
> +			ret = IRQ_HANDLED;
> +			if (!(WARN_ON(!cd->ccnt))) {
> +				arm_ni_event_read(cd->ccnt);
> +				arm_ni_init_ccnt(cd);
> +			}
>  		}
> -	}
> -	for (int i = 0; i < NI_NUM_COUNTERS; i++) {
> -		if (!(reg & (1U << i)))
> -			continue;
> -		ret = IRQ_HANDLED;
> -		if (!(WARN_ON(!cd->evcnt[i]))) {
> -			arm_ni_event_read(cd->evcnt[i]);
> -			arm_ni_init_evcnt(cd, i);
> +		for (int i = 0; i < NI_NUM_COUNTERS; i++) {
> +			if (!(reg & (1U << i)))
> +				continue;
> +			ret = IRQ_HANDLED;
> +			if (!(WARN_ON(!cd->evcnt[i]))) {
> +				arm_ni_event_read(cd->evcnt[i]);
> +				arm_ni_init_evcnt(cd, i);
> +			}
>  		}
> +		writel_relaxed(reg, cd->pmu_base + NI_PMOVSCLR);
> +		if (!cd->irq_friend)
> +			return ret;
> +		cd += cd->irq_friend;
>  	}
> -	writel_relaxed(reg, cd->pmu_base + NI_PMOVSCLR);
> -	return ret;
>  }
>  
>  static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_start)
>  {
>  	struct arm_ni_cd *cd = ni->cds + node->id;
>  	const char *name;
> -	int err;
>  
>  	cd->id = node->id;
>  	cd->num_units = node->num_components;
> @@ -534,20 +539,11 @@ static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_s
>  		       cd->pmu_base + NI_PMCR);
>  	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMCNTENCLR);
>  	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMOVSCLR);
> -	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENSET);
>  
>  	cd->irq = platform_get_irq(to_platform_device(ni->dev), cd->id);
>  	if (cd->irq < 0)
>  		return cd->irq;
>  
> -	err = devm_request_irq(ni->dev, cd->irq, arm_ni_handle_irq,
> -			       IRQF_NOBALANCING | IRQF_NO_THREAD,
> -			       dev_name(ni->dev), cd);
> -	if (err)
> -		return err;
> -
> -	irq_set_affinity(cd->irq, cpumask_of(ni->cpu));
> -
>  	cd->pmu = (struct pmu) {
>  		.module = THIS_MODULE,
>  		.parent = ni->dev,
> @@ -593,6 +589,30 @@ static void arm_ni_probe_domain(void __iomem *base, struct arm_ni_node *node)
>  	node->num_components = readl_relaxed(base + NI_CHILD_NODE_INFO);
>  }
>  
> +static int arm_ni_init_irqs(struct arm_ni *ni)
> +{
> +	int err;
> +
> +	ni_for_each_cd(ni, cd) {
> +		for (struct arm_ni_cd *prev = cd; prev-- > ni->cds; ) {
> +			if (prev->irq == cd->irq) {
> +				prev->irq_friend = cd - prev;

Can't this race with the read of `irq_friend` in the interrupt handler?

Similarly, how do you handle the race between the interrupt firing on
one CD and the friend CD being torn down on e.g. the remove path?

> +				goto enable_irq;
> +			}
> +		}
> +		err = devm_request_irq(ni->dev, cd->irq, arm_ni_handle_irq,
> +				       IRQF_NOBALANCING | IRQF_NO_THREAD,
> +				       dev_name(ni->dev), cd);

What's the reason not to use IRQF_SHARED and register multiple handlers?
I'm sure there is one, but capturing that in a comment would help to
justify the manual multiplexing.

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] perf/arm-ni: Support sharing IRQs within an NI instance
  2025-05-19 12:29   ` Will Deacon
@ 2025-05-19 14:57     ` Robin Murphy
  2025-07-04 17:47       ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Robin Murphy @ 2025-05-19 14:57 UTC (permalink / raw)
  To: Will Deacon
  Cc: mark.rutland, linux-arm-kernel, linux-perf-users, allen.wang,
	peter.du, andy.xu

On 19/05/2025 1:29 pm, Will Deacon wrote:
> On Tue, May 13, 2025 at 04:39:00PM +0100, Robin Murphy wrote:
>> From: Shouping Wang <allen.wang@hj-micro.com>
>>
>> NI-700 has a distinct PMU interrupt output for each Clock Domain,
>> however some integrations may still combine these together externally.
>> The initial driver didn't attempt to support this, in anticipation of a
>> more general solution for IRQ sharing between system PMU instances, but
>> that's still a way off, so let's make this intermediate step for now to
>> at least allow sharing IRQs within an individual NI instance.
>>
>> Now that CPU affinity and migration are cleaned up, it's fairly
>> straightforward to adopt similar logic to arm-cmn, to identify CDs with
>> a common interrupt and loop over them directly in the handler.
>>
>> Signed-off-by: Shouping Wang <allen.wang@hj-micro.com>
>> [ rm: Rework for affinity handling, cosmetics, new commit message ]
>> Signed-off-by: Robin Murphy <robin.murphy@arm.com>
>> ---
>>   drivers/perf/arm-ni.c | 78 ++++++++++++++++++++++++++++---------------
>>   1 file changed, 51 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/perf/arm-ni.c b/drivers/perf/arm-ni.c
>> index 168750e78fc4..ca033c8785ff 100644
>> --- a/drivers/perf/arm-ni.c
>> +++ b/drivers/perf/arm-ni.c
>> @@ -102,6 +102,7 @@ struct arm_ni_unit {
>>   struct arm_ni_cd {
>>   	void __iomem *pmu_base;
>>   	u16 id;
>> +	s8 irq_friend;
> 
> Why not just store the pointer to the cd as opposed to a relative offset?

Same reason as in arm-cmn - we know we're referencing another entry 
within the same array, so a full pointer costs 8 bytes of memory to be 
mostly redundant, while the offset fits in existing padding so costs 
nothing. In this case we could actually encode an absolute index just as 
easily as a relative offset since we have at most 32 CDs, but since you 
and Mark tend to prefer consistency across drivers where they do similar 
things, following the exact same pattern as arm-cmn feels like the 
clearest option (and in that existing case it is more significant, since 
arm-cmn's offset range is only +/-3, but somewhere in the middle of an 
array of many hundreds.)

>>   	int num_units;
>>   	int irq;
>>   	struct pmu pmu;
>> @@ -448,33 +449,37 @@ static irqreturn_t arm_ni_handle_irq(int irq, void *dev_id)
>>   {
>>   	struct arm_ni_cd *cd = dev_id;
>>   	irqreturn_t ret = IRQ_NONE;
>> -	u32 reg = readl_relaxed(cd->pmu_base + NI_PMOVSCLR);
>>   
>> -	if (reg & (1U << NI_CCNT_IDX)) {
>> -		ret = IRQ_HANDLED;
>> -		if (!(WARN_ON(!cd->ccnt))) {
>> -			arm_ni_event_read(cd->ccnt);
>> -			arm_ni_init_ccnt(cd);
>> +	for (;;) {
> 
> Could this be a do { ... } while (cd) loop instead?

I did try, but it still ends up looking just as clunky if not more so - 
I assume I must have gone through the same exercise 5 years ago for 
arm-cmn to have ended up this way :)

>> +		u32 reg = readl_relaxed(cd->pmu_base + NI_PMOVSCLR);
>> +
>> +		if (reg & (1U << NI_CCNT_IDX)) {
>> +			ret = IRQ_HANDLED;
>> +			if (!(WARN_ON(!cd->ccnt))) {
>> +				arm_ni_event_read(cd->ccnt);
>> +				arm_ni_init_ccnt(cd);
>> +			}
>>   		}
>> -	}
>> -	for (int i = 0; i < NI_NUM_COUNTERS; i++) {
>> -		if (!(reg & (1U << i)))
>> -			continue;
>> -		ret = IRQ_HANDLED;
>> -		if (!(WARN_ON(!cd->evcnt[i]))) {
>> -			arm_ni_event_read(cd->evcnt[i]);
>> -			arm_ni_init_evcnt(cd, i);
>> +		for (int i = 0; i < NI_NUM_COUNTERS; i++) {
>> +			if (!(reg & (1U << i)))
>> +				continue;
>> +			ret = IRQ_HANDLED;
>> +			if (!(WARN_ON(!cd->evcnt[i]))) {
>> +				arm_ni_event_read(cd->evcnt[i]);
>> +				arm_ni_init_evcnt(cd, i);
>> +			}
>>   		}
>> +		writel_relaxed(reg, cd->pmu_base + NI_PMOVSCLR);
>> +		if (!cd->irq_friend)
>> +			return ret;
>> +		cd += cd->irq_friend;
>>   	}
>> -	writel_relaxed(reg, cd->pmu_base + NI_PMOVSCLR);
>> -	return ret;
>>   }
>>   
>>   static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_start)
>>   {
>>   	struct arm_ni_cd *cd = ni->cds + node->id;
>>   	const char *name;
>> -	int err;
>>   
>>   	cd->id = node->id;
>>   	cd->num_units = node->num_components;
>> @@ -534,20 +539,11 @@ static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_s
>>   		       cd->pmu_base + NI_PMCR);
>>   	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMCNTENCLR);
>>   	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMOVSCLR);
>> -	writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENSET);
>>   
>>   	cd->irq = platform_get_irq(to_platform_device(ni->dev), cd->id);
>>   	if (cd->irq < 0)
>>   		return cd->irq;
>>   
>> -	err = devm_request_irq(ni->dev, cd->irq, arm_ni_handle_irq,
>> -			       IRQF_NOBALANCING | IRQF_NO_THREAD,
>> -			       dev_name(ni->dev), cd);
>> -	if (err)
>> -		return err;
>> -
>> -	irq_set_affinity(cd->irq, cpumask_of(ni->cpu));
>> -
>>   	cd->pmu = (struct pmu) {
>>   		.module = THIS_MODULE,
>>   		.parent = ni->dev,
>> @@ -593,6 +589,30 @@ static void arm_ni_probe_domain(void __iomem *base, struct arm_ni_node *node)
>>   	node->num_components = readl_relaxed(base + NI_CHILD_NODE_INFO);
>>   }
>>   
>> +static int arm_ni_init_irqs(struct arm_ni *ni)
>> +{
>> +	int err;
>> +
>> +	ni_for_each_cd(ni, cd) {
>> +		for (struct arm_ni_cd *prev = cd; prev-- > ni->cds; ) {
>> +			if (prev->irq == cd->irq) {
>> +				prev->irq_friend = cd - prev;
> 
> Can't this race with the read of `irq_friend` in the interrupt handler?

Not in any way that matters. Any IRQ at this point would be a spurious 
one left latched at the interrupt controller after we've already reset 
all the PMUs. The handler can hardly observe a torn partial store of 1 
byte, so either it'll see a valid irq_friend or none. Either way we'll 
eventually return IRQ_NONE, EOI the phantom IRQ and be done.

I'm not entertaining the idea of somehow being preempted here for long 
enough for userspace to notice the already-registered PMUs (even though 
the module load hasn't yet finished...), open an event and have it count 
enough to genuinely overflow, because that would be in the order of at 
least tens of seconds if not minutes.

> Similarly, how do you handle the race between the interrupt firing on
> one CD and the friend CD being torn down on e.g. the remove path?

Again, we've already disabled *all* the PMUs (and explicitly cleared 
their PMINTENs to leave no doubt) in arm_ni_remove(). After that, devres 
ordering will then free the IRQs themselves before the CDs ever get freed.

>> +				goto enable_irq;
>> +			}
>> +		}
>> +		err = devm_request_irq(ni->dev, cd->irq, arm_ni_handle_irq,
>> +				       IRQF_NOBALANCING | IRQF_NO_THREAD,
>> +				       dev_name(ni->dev), cd);
> 
> What's the reason not to use IRQF_SHARED and register multiple handlers?
> I'm sure there is one, but capturing that in a comment would help to
> justify the manual multiplexing.

The usual perf reason - IRQF_SHARED allows any *other* random driver to 
also request the same IRQ and then mess with its affinity behind our back.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 0/3] perf/arm-ni: IRQ improvements
  2025-05-13 15:38 [PATCH 0/3] perf/arm-ni: IRQ improvements Robin Murphy
                   ` (3 preceding siblings ...)
  2025-05-16 11:46 ` [PATCH 0/3] perf/arm-ni: IRQ improvements Shouping Wang
@ 2025-07-04 17:44 ` Will Deacon
  4 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2025-07-04 17:44 UTC (permalink / raw)
  To: Robin Murphy
  Cc: catalin.marinas, kernel-team, Will Deacon, mark.rutland,
	linux-arm-kernel, linux-perf-users, allen.wang, peter.du, andy.xu

On Tue, 13 May 2025 16:38:57 +0100, Robin Murphy wrote:
> While the previous attempt[1] was missing some important aspects, it's
> a valid problem to solve, so I figured I'd take a look myself. This
> ended up turning into a slightly wider cleanup, but I think the end
> result is nice enough.
> 
> As usual I don't have anything to test this on, so if there are any
> silly mistakes left that the compiler couldn't catch please do let me
> know.
> 
> [...]

Applied first patch to will (for-next/perf), thanks!

[1/3] perf/arm-ni: Set initial IRQ affinity
      https://git.kernel.org/will/c/c872d7c83738

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/3] perf/arm-ni: Support sharing IRQs within an NI instance
  2025-05-19 14:57     ` Robin Murphy
@ 2025-07-04 17:47       ` Will Deacon
  0 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2025-07-04 17:47 UTC (permalink / raw)
  To: Robin Murphy
  Cc: mark.rutland, linux-arm-kernel, linux-perf-users, allen.wang,
	peter.du, andy.xu

On Mon, May 19, 2025 at 03:57:31PM +0100, Robin Murphy wrote:
> On 19/05/2025 1:29 pm, Will Deacon wrote:
> > On Tue, May 13, 2025 at 04:39:00PM +0100, Robin Murphy wrote:
> > > From: Shouping Wang <allen.wang@hj-micro.com>
> > > 
> > > NI-700 has a distinct PMU interrupt output for each Clock Domain,
> > > however some integrations may still combine these together externally.
> > > The initial driver didn't attempt to support this, in anticipation of a
> > > more general solution for IRQ sharing between system PMU instances, but
> > > that's still a way off, so let's make this intermediate step for now to
> > > at least allow sharing IRQs within an individual NI instance.
> > > 
> > > Now that CPU affinity and migration are cleaned up, it's fairly
> > > straightforward to adopt similar logic to arm-cmn, to identify CDs with
> > > a common interrupt and loop over them directly in the handler.
> > > 
> > > Signed-off-by: Shouping Wang <allen.wang@hj-micro.com>
> > > [ rm: Rework for affinity handling, cosmetics, new commit message ]
> > > Signed-off-by: Robin Murphy <robin.murphy@arm.com>
> > > ---
> > >   drivers/perf/arm-ni.c | 78 ++++++++++++++++++++++++++++---------------
> > >   1 file changed, 51 insertions(+), 27 deletions(-)

[...]

> > > @@ -593,6 +589,30 @@ static void arm_ni_probe_domain(void __iomem *base, struct arm_ni_node *node)
> > >   	node->num_components = readl_relaxed(base + NI_CHILD_NODE_INFO);
> > >   }
> > > +static int arm_ni_init_irqs(struct arm_ni *ni)
> > > +{
> > > +	int err;
> > > +
> > > +	ni_for_each_cd(ni, cd) {
> > > +		for (struct arm_ni_cd *prev = cd; prev-- > ni->cds; ) {
> > > +			if (prev->irq == cd->irq) {
> > > +				prev->irq_friend = cd - prev;
> > 
> > Can't this race with the read of `irq_friend` in the interrupt handler?
> 
> Not in any way that matters. Any IRQ at this point would be a spurious one
> left latched at the interrupt controller after we've already reset all the
> PMUs. The handler can hardly observe a torn partial store of 1 byte, so
> either it'll see a valid irq_friend or none. Either way we'll eventually
> return IRQ_NONE, EOI the phantom IRQ and be done.
> 
> I'm not entertaining the idea of somehow being preempted here for long
> enough for userspace to notice the already-registered PMUs (even though the
> module load hasn't yet finished...), open an event and have it count enough
> to genuinely overflow, because that would be in the order of at least tens
> of seconds if not minutes.

Can't we save ourselves from having to think about that by just having
two loops? That is, set up the friend relationships and enable IRQ
generation in the first pass, then go through and actually request the
interrupts in the second one?

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-07-04 17:47 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-05-13 15:38 [PATCH 0/3] perf/arm-ni: IRQ improvements Robin Murphy
2025-05-13 15:38 ` [PATCH 1/3] perf/arm-ni: Set initial IRQ affinity Robin Murphy
2025-05-13 15:38 ` [PATCH 2/3] perf/arm-ni: Consolidate CPU affinity handling Robin Murphy
2025-05-13 15:39 ` [PATCH 3/3] perf/arm-ni: Support sharing IRQs within an NI instance Robin Murphy
2025-05-19 12:29   ` Will Deacon
2025-05-19 14:57     ` Robin Murphy
2025-07-04 17:47       ` Will Deacon
2025-05-16 11:46 ` [PATCH 0/3] perf/arm-ni: IRQ improvements Shouping Wang
2025-07-04 17:44 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).