linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
@ 2023-05-28  8:02 Marc Zyngier
  2023-05-30  8:29 ` Sidharth Kshatriya
  2023-06-05 16:35 ` Will Deacon
  0 siblings, 2 replies; 4+ messages in thread
From: Marc Zyngier @ 2023-05-28  8:02 UTC (permalink / raw)
  To: linux-arm-kernel
  Cc: Janne Grunau, Hector Martin, Mark Rutland, Will Deacon,
	Sidharth Kshatriya

Sidharth reports that on M2, the PMU never generates any interrupt
when using 'perf record', which is a annoying as you get no sample.
I'm temped to say "no sample, no problem", but others may have
a different opinion.

Upon investigation, it appears that the counters on M2 are
significantly different from the ones on M1, as they count on
64 bits instead of 48. Which of course, in the fine M1 tradition,
means that we can only use 63 bits, as the top bit is used to signal
the interrupt...

This results in having to introduce yet another flag to indicate yet
another odd counter width. Who knows what the next crazy implementation
will do...

With this, perf can work out the correct offset, and 'perf record'
works as intended.

Tested on M2 and M2-Pro CPUs.

Cc: Janne Grunau <j@jannau.net>
Cc: Hector Martin <marcan@marcan.st>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will@kernel.org>
Fixes: 7d0bfb7c9977 ("drivers/perf: apple_m1: Add Apple M2 support")
Reported-by: Sidharth Kshatriya <sid.kshatriya@gmail.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
---
 drivers/perf/apple_m1_cpu_pmu.c | 30 ++++++++++++++++++++++++------
 drivers/perf/arm_pmu.c          |  2 ++
 include/linux/perf/arm_pmu.h    |  2 ++
 3 files changed, 28 insertions(+), 6 deletions(-)

diff --git a/drivers/perf/apple_m1_cpu_pmu.c b/drivers/perf/apple_m1_cpu_pmu.c
index 8574c6e58c83..cd2de44b61b9 100644
--- a/drivers/perf/apple_m1_cpu_pmu.c
+++ b/drivers/perf/apple_m1_cpu_pmu.c
@@ -493,6 +493,17 @@ static int m1_pmu_map_event(struct perf_event *event)
 	return armpmu_map_event(event, &m1_pmu_perf_map, NULL, M1_PMU_CFG_EVENT);
 }
 
+static int m2_pmu_map_event(struct perf_event *event)
+{
+	/*
+	 * Same deal as the above, except that M2 has 64bit counters.
+	 * Which, as far as we're concerned, actually means 63 bits.
+	 * Yes, this is getting awkward.
+	 */
+	event->hw.flags |= ARMPMU_EVT_63BIT;
+	return armpmu_map_event(event, &m1_pmu_perf_map, NULL, M1_PMU_CFG_EVENT);
+}
+
 static void m1_pmu_reset(void *info)
 {
 	int i;
@@ -525,7 +536,7 @@ static int m1_pmu_set_event_filter(struct hw_perf_event *event,
 	return 0;
 }
 
-static int m1_pmu_init(struct arm_pmu *cpu_pmu)
+static int m1_pmu_init(struct arm_pmu *cpu_pmu, u32 flags)
 {
 	cpu_pmu->handle_irq	  = m1_pmu_handle_irq;
 	cpu_pmu->enable		  = m1_pmu_enable_event;
@@ -536,7 +547,14 @@ static int m1_pmu_init(struct arm_pmu *cpu_pmu)
 	cpu_pmu->clear_event_idx  = m1_pmu_clear_event_idx;
 	cpu_pmu->start		  = m1_pmu_start;
 	cpu_pmu->stop		  = m1_pmu_stop;
-	cpu_pmu->map_event	  = m1_pmu_map_event;
+
+	if (flags & ARMPMU_EVT_47BIT)
+		cpu_pmu->map_event = m1_pmu_map_event;
+	else if (flags & ARMPMU_EVT_63BIT)
+		cpu_pmu->map_event = m2_pmu_map_event;
+	else
+		return WARN_ON(-EINVAL);
+
 	cpu_pmu->reset		  = m1_pmu_reset;
 	cpu_pmu->set_event_filter = m1_pmu_set_event_filter;
 
@@ -550,25 +568,25 @@ static int m1_pmu_init(struct arm_pmu *cpu_pmu)
 static int m1_pmu_ice_init(struct arm_pmu *cpu_pmu)
 {
 	cpu_pmu->name = "apple_icestorm_pmu";
-	return m1_pmu_init(cpu_pmu);
+	return m1_pmu_init(cpu_pmu, ARMPMU_EVT_47BIT);
 }
 
 static int m1_pmu_fire_init(struct arm_pmu *cpu_pmu)
 {
 	cpu_pmu->name = "apple_firestorm_pmu";
-	return m1_pmu_init(cpu_pmu);
+	return m1_pmu_init(cpu_pmu, ARMPMU_EVT_47BIT);
 }
 
 static int m2_pmu_avalanche_init(struct arm_pmu *cpu_pmu)
 {
 	cpu_pmu->name = "apple_avalanche_pmu";
-	return m1_pmu_init(cpu_pmu);
+	return m1_pmu_init(cpu_pmu, ARMPMU_EVT_63BIT);
 }
 
 static int m2_pmu_blizzard_init(struct arm_pmu *cpu_pmu)
 {
 	cpu_pmu->name = "apple_blizzard_pmu";
-	return m1_pmu_init(cpu_pmu);
+	return m1_pmu_init(cpu_pmu, ARMPMU_EVT_63BIT);
 }
 
 static const struct of_device_id m1_pmu_of_device_ids[] = {
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 15bd1e34a88e..277e29fbd504 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -109,6 +109,8 @@ static inline u64 arm_pmu_event_max_period(struct perf_event *event)
 {
 	if (event->hw.flags & ARMPMU_EVT_64BIT)
 		return GENMASK_ULL(63, 0);
+	else if (event->hw.flags & ARMPMU_EVT_63BIT)
+		return GENMASK_ULL(62, 0);
 	else if (event->hw.flags & ARMPMU_EVT_47BIT)
 		return GENMASK_ULL(46, 0);
 	else
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 525b5d64e394..c0e4baf940dc 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -26,9 +26,11 @@
  */
 #define ARMPMU_EVT_64BIT		0x00001 /* Event uses a 64bit counter */
 #define ARMPMU_EVT_47BIT		0x00002 /* Event uses a 47bit counter */
+#define ARMPMU_EVT_63BIT		0x00004 /* Event uses a 63bit counter */
 
 static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_64BIT) == ARMPMU_EVT_64BIT);
 static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_47BIT) == ARMPMU_EVT_47BIT);
+static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_63BIT) == ARMPMU_EVT_63BIT);
 
 #define HW_OP_UNSUPPORTED		0xFFFF
 #define C(_x)				PERF_COUNT_HW_CACHE_##_x
-- 
2.34.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
  2023-05-28  8:02 [PATCH] drivers/perf: apple_m1: Force 63bit counters for M2 CPUs Marc Zyngier
@ 2023-05-30  8:29 ` Sidharth Kshatriya
  2023-06-05 14:45   ` Will Deacon
  2023-06-05 16:35 ` Will Deacon
  1 sibling, 1 reply; 4+ messages in thread
From: Sidharth Kshatriya @ 2023-05-30  8:29 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: linux-arm-kernel, Janne Grunau, Hector Martin, Mark Rutland,
	Will Deacon

Thanks again for the patch earlier Marc !

Happy to further report that:
- After applying Marc's patch (upon 6.2.0-asahi11-1) Asahi Linux edge
  kernel, I am able to see interrupts during `perf record` on the M2
- The rr debugger also works quite well now on the M2 with the patched
  kernel. You will need to build rr from the latest master branch from
the rr git repo
  as it contains a couple of other commits to support M2. (BTW my
investigation into
  issues related to the M2 PMU was initially triggered by problems I
was facing with
  getting rr to work properly on the M2. rr uses PMU events in its
functioning).

Thanks,

Sidharth

On Sun, May 28, 2023 at 1:33 PM Marc Zyngier <maz@kernel.org> wrote:
>
> Sidharth reports that on M2, the PMU never generates any interrupt
> when using 'perf record', which is a annoying as you get no sample.
> I'm temped to say "no sample, no problem", but others may have
> a different opinion.
>
> Upon investigation, it appears that the counters on M2 are
> significantly different from the ones on M1, as they count on
> 64 bits instead of 48. Which of course, in the fine M1 tradition,
> means that we can only use 63 bits, as the top bit is used to signal
> the interrupt...
>
> This results in having to introduce yet another flag to indicate yet
> another odd counter width. Who knows what the next crazy implementation
> will do...
>
> With this, perf can work out the correct offset, and 'perf record'
> works as intended.
>
> Tested on M2 and M2-Pro CPUs.
>
> Cc: Janne Grunau <j@jannau.net>
> Cc: Hector Martin <marcan@marcan.st>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Fixes: 7d0bfb7c9977 ("drivers/perf: apple_m1: Add Apple M2 support")
> Reported-by: Sidharth Kshatriya <sid.kshatriya@gmail.com>
> Signed-off-by: Marc Zyngier <maz@kernel.org>
> ---
>  drivers/perf/apple_m1_cpu_pmu.c | 30 ++++++++++++++++++++++++------
>  drivers/perf/arm_pmu.c          |  2 ++
>  include/linux/perf/arm_pmu.h    |  2 ++
>  3 files changed, 28 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/perf/apple_m1_cpu_pmu.c b/drivers/perf/apple_m1_cpu_pmu.c
> index 8574c6e58c83..cd2de44b61b9 100644
> --- a/drivers/perf/apple_m1_cpu_pmu.c
> +++ b/drivers/perf/apple_m1_cpu_pmu.c
> @@ -493,6 +493,17 @@ static int m1_pmu_map_event(struct perf_event *event)
>         return armpmu_map_event(event, &m1_pmu_perf_map, NULL, M1_PMU_CFG_EVENT);
>  }
>
> +static int m2_pmu_map_event(struct perf_event *event)
> +{
> +       /*
> +        * Same deal as the above, except that M2 has 64bit counters.
> +        * Which, as far as we're concerned, actually means 63 bits.
> +        * Yes, this is getting awkward.
> +        */
> +       event->hw.flags |= ARMPMU_EVT_63BIT;
> +       return armpmu_map_event(event, &m1_pmu_perf_map, NULL, M1_PMU_CFG_EVENT);
> +}
> +
>  static void m1_pmu_reset(void *info)
>  {
>         int i;
> @@ -525,7 +536,7 @@ static int m1_pmu_set_event_filter(struct hw_perf_event *event,
>         return 0;
>  }
>
> -static int m1_pmu_init(struct arm_pmu *cpu_pmu)
> +static int m1_pmu_init(struct arm_pmu *cpu_pmu, u32 flags)
>  {
>         cpu_pmu->handle_irq       = m1_pmu_handle_irq;
>         cpu_pmu->enable           = m1_pmu_enable_event;
> @@ -536,7 +547,14 @@ static int m1_pmu_init(struct arm_pmu *cpu_pmu)
>         cpu_pmu->clear_event_idx  = m1_pmu_clear_event_idx;
>         cpu_pmu->start            = m1_pmu_start;
>         cpu_pmu->stop             = m1_pmu_stop;
> -       cpu_pmu->map_event        = m1_pmu_map_event;
> +
> +       if (flags & ARMPMU_EVT_47BIT)
> +               cpu_pmu->map_event = m1_pmu_map_event;
> +       else if (flags & ARMPMU_EVT_63BIT)
> +               cpu_pmu->map_event = m2_pmu_map_event;
> +       else
> +               return WARN_ON(-EINVAL);
> +
>         cpu_pmu->reset            = m1_pmu_reset;
>         cpu_pmu->set_event_filter = m1_pmu_set_event_filter;
>
> @@ -550,25 +568,25 @@ static int m1_pmu_init(struct arm_pmu *cpu_pmu)
>  static int m1_pmu_ice_init(struct arm_pmu *cpu_pmu)
>  {
>         cpu_pmu->name = "apple_icestorm_pmu";
> -       return m1_pmu_init(cpu_pmu);
> +       return m1_pmu_init(cpu_pmu, ARMPMU_EVT_47BIT);
>  }
>
>  static int m1_pmu_fire_init(struct arm_pmu *cpu_pmu)
>  {
>         cpu_pmu->name = "apple_firestorm_pmu";
> -       return m1_pmu_init(cpu_pmu);
> +       return m1_pmu_init(cpu_pmu, ARMPMU_EVT_47BIT);
>  }
>
>  static int m2_pmu_avalanche_init(struct arm_pmu *cpu_pmu)
>  {
>         cpu_pmu->name = "apple_avalanche_pmu";
> -       return m1_pmu_init(cpu_pmu);
> +       return m1_pmu_init(cpu_pmu, ARMPMU_EVT_63BIT);
>  }
>
>  static int m2_pmu_blizzard_init(struct arm_pmu *cpu_pmu)
>  {
>         cpu_pmu->name = "apple_blizzard_pmu";
> -       return m1_pmu_init(cpu_pmu);
> +       return m1_pmu_init(cpu_pmu, ARMPMU_EVT_63BIT);
>  }
>
>  static const struct of_device_id m1_pmu_of_device_ids[] = {
> diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
> index 15bd1e34a88e..277e29fbd504 100644
> --- a/drivers/perf/arm_pmu.c
> +++ b/drivers/perf/arm_pmu.c
> @@ -109,6 +109,8 @@ static inline u64 arm_pmu_event_max_period(struct perf_event *event)
>  {
>         if (event->hw.flags & ARMPMU_EVT_64BIT)
>                 return GENMASK_ULL(63, 0);
> +       else if (event->hw.flags & ARMPMU_EVT_63BIT)
> +               return GENMASK_ULL(62, 0);
>         else if (event->hw.flags & ARMPMU_EVT_47BIT)
>                 return GENMASK_ULL(46, 0);
>         else
> diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
> index 525b5d64e394..c0e4baf940dc 100644
> --- a/include/linux/perf/arm_pmu.h
> +++ b/include/linux/perf/arm_pmu.h
> @@ -26,9 +26,11 @@
>   */
>  #define ARMPMU_EVT_64BIT               0x00001 /* Event uses a 64bit counter */
>  #define ARMPMU_EVT_47BIT               0x00002 /* Event uses a 47bit counter */
> +#define ARMPMU_EVT_63BIT               0x00004 /* Event uses a 63bit counter */
>
>  static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_64BIT) == ARMPMU_EVT_64BIT);
>  static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_47BIT) == ARMPMU_EVT_47BIT);
> +static_assert((PERF_EVENT_FLAG_ARCH & ARMPMU_EVT_63BIT) == ARMPMU_EVT_63BIT);
>
>  #define HW_OP_UNSUPPORTED              0xFFFF
>  #define C(_x)                          PERF_COUNT_HW_CACHE_##_x
> --
> 2.34.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
  2023-05-30  8:29 ` Sidharth Kshatriya
@ 2023-06-05 14:45   ` Will Deacon
  0 siblings, 0 replies; 4+ messages in thread
From: Will Deacon @ 2023-06-05 14:45 UTC (permalink / raw)
  To: Sidharth Kshatriya
  Cc: Marc Zyngier, linux-arm-kernel, Janne Grunau, Hector Martin,
	Mark Rutland

On Tue, May 30, 2023 at 01:59:54PM +0530, Sidharth Kshatriya wrote:
> Thanks again for the patch earlier Marc !
> 
> Happy to further report that:
> - After applying Marc's patch (upon 6.2.0-asahi11-1) Asahi Linux edge
>   kernel, I am able to see interrupts during `perf record` on the M2
> - The rr debugger also works quite well now on the M2 with the patched
>   kernel. You will need to build rr from the latest master branch from
> the rr git repo
>   as it contains a couple of other commits to support M2. (BTW my
> investigation into
>   issues related to the M2 PMU was initially triggered by problems I
> was facing with
>   getting rr to work properly on the M2. rr uses PMU events in its
> functioning).

Thanks Sidharth, I'll add your Tested-by to the patch.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
  2023-05-28  8:02 [PATCH] drivers/perf: apple_m1: Force 63bit counters for M2 CPUs Marc Zyngier
  2023-05-30  8:29 ` Sidharth Kshatriya
@ 2023-06-05 16:35 ` Will Deacon
  1 sibling, 0 replies; 4+ messages in thread
From: Will Deacon @ 2023-06-05 16:35 UTC (permalink / raw)
  To: linux-arm-kernel, Marc Zyngier
  Cc: catalin.marinas, kernel-team, Will Deacon, Janne Grunau,
	Mark Rutland, Sidharth Kshatriya, Hector Martin

On Sun, 28 May 2023 09:02:05 +0100, Marc Zyngier wrote:
> Sidharth reports that on M2, the PMU never generates any interrupt
> when using 'perf record', which is a annoying as you get no sample.
> I'm temped to say "no sample, no problem", but others may have
> a different opinion.
> 
> Upon investigation, it appears that the counters on M2 are
> significantly different from the ones on M1, as they count on
> 64 bits instead of 48. Which of course, in the fine M1 tradition,
> means that we can only use 63 bits, as the top bit is used to signal
> the interrupt...
> 
> [...]

Applied to will (for-next/perf), thanks!

[1/1] drivers/perf: apple_m1: Force 63bit counters for M2 CPUs
      https://git.kernel.org/will/c/8be3593b9efa

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-06-05 16:35 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-28  8:02 [PATCH] drivers/perf: apple_m1: Force 63bit counters for M2 CPUs Marc Zyngier
2023-05-30  8:29 ` Sidharth Kshatriya
2023-06-05 14:45   ` Will Deacon
2023-06-05 16:35 ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).