linux-doc.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v20 00/11] arm64/perf: Enable branch stack sampling
@ 2025-02-18 20:39 Rob Herring (Arm)
  2025-02-18 20:39 ` [PATCH v20 01/11] perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters Rob Herring (Arm)
                   ` (12 more replies)
  0 siblings, 13 replies; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:39 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm, Mark Brown

This series enables perf branch stack sampling support on arm64 via a 
v9.2 arch feature called Branch Record Buffer Extension (BRBE). Details 
on BRBE can be found in the Arm ARM[1] chapter D18.

I've picked up this series from Anshuman. v19 and v20 versions have been 
reworked quite a bit by Mark and myself. The bulk of those changes are 
in patch 11.

Patches 1-7 are new clean-ups/prep which stand on their own. They 
were previously posted here[2]. Please pick them up if there's no issues 
with them.

Patches 8-11 add BRBE support with the actual support in patch 11.

A git branch is here[3].

[1] https://developer.arm.com/documentation/ddi0487/latest/
[2] https://lore.kernel.org/all/20250107-arm-pmu-cleanups-v1-v1-0-313951346a25@kernel.org/
[3] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git arm/brbe-v20

v20:
 - Added back some of the arm64 specific exception types. The x86 IRQ 
   branches also include other exceptions like page faults. On arm64, we 
   can distinguish the exception types, so we do. Also, to better 
   align with x86, we convert 'call' branches which are user to kernel 
   to 'syscall'.
 - Only enable exceptions and exception returns if recording kernel
   branches (matching x86)
 - Drop requiring event and branch privileges to match
 - Add "branches" caps sysfs attribute like x86
 - Reword comment about FZP and MDCR_EL2.HPMN interaction
 - Rework BRBE invalidation to avoid invalidating in interrupt handler
   when no handled events capture the branch stack (i.e. when there are 
   multiple users).
 - Also clear BRBCR_ELx bits in brbe_disable(). This is for KVM nVHE 
   checks if BRBE is enabled.
 - Document that MDCR_EL3.SBRBE can be 0b01 also

v19:
 - https://lore.kernel.org/all/20250202-arm-brbe-v19-v19-0-1c1300802385@kernel.org/
 - Drop saving of branch records when task scheduled out (Mark). Make 
   sched_task() callback actually get called. Enabling requires a call 
   to perf_sched_cb_inc(). So the saving of branch records never 
   happened.
 - Got rid of added armpmu ops. All BRBE support is contained within 
   pmuv3 code.
 - Fix freeze on overflow for VHE
 - The cycle counter doesn't freeze BRBE on overflow, so avoid assigning
   it when BRBE is enabled.
 - Drop all the Arm specific exception branches. Not a clear need for
   them.
 - Fix handling of branch 'cycles' reading. CC field is
   mantissa/exponent, not an integer.
 - Rework s/w filtering to better match h/w filtering
 - Reject events with disjoint event filter and branch filter or with 
   exclude_host set
 - Dropped perf test patch which has been applied for 6.14
 - Dropped patch "KVM: arm64: Explicitly handle BRBE traps as UNDEFINED"
   which has been applied for 6.14

v18:
 - https://lore.kernel.org/all/20240613061731.3109448-1-anshuman.khandual@arm.com/

For v1-v17, see the above link. Not going to duplicate it all here...

Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org>
---
Changes in v20:
- EDITME: describe what is new in this series revision.
- EDITME: use bulletpoints and terse descriptions.
- Link to v19: https://lore.kernel.org/r/20250202-arm-brbe-v19-v19-0-1c1300802385@kernel.org

---
Anshuman Khandual (4):
      arm64/sysreg: Add BRBE registers and fields
      arm64: Handle BRBE booting requirements
      KVM: arm64: nvhe: Disable branch generation in nVHE guests
      perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)

Mark Rutland (3):
      perf: arm_pmu: Don't disable counter in armpmu_add()
      perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event()
      perf: arm_pmu: Move PMUv3-specific data

Rob Herring (Arm) (4):
      perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters
      perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts
      perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event()
      perf: apple_m1: Don't disable counter in m1_pmu_enable_event()

 Documentation/arch/arm64/booting.rst |  21 +
 arch/arm64/include/asm/el2_setup.h   |  86 +++-
 arch/arm64/include/asm/kvm_host.h    |   2 +
 arch/arm64/include/asm/sysreg.h      |  17 +-
 arch/arm64/kvm/debug.c               |   4 +
 arch/arm64/kvm/hyp/nvhe/debug-sr.c   |  32 ++
 arch/arm64/kvm/hyp/nvhe/switch.c     |   2 +-
 arch/arm64/tools/sysreg              | 132 ++++++
 drivers/perf/Kconfig                 |  11 +
 drivers/perf/Makefile                |   1 +
 drivers/perf/apple_m1_cpu_pmu.c      |   4 -
 drivers/perf/arm_brbe.c              | 802 +++++++++++++++++++++++++++++++++++
 drivers/perf/arm_brbe.h              |  47 ++
 drivers/perf/arm_pmu.c               |  23 +-
 drivers/perf/arm_pmuv3.c             | 135 +++++-
 drivers/perf/arm_v7_pmu.c            |  50 ---
 include/linux/perf/arm_pmu.h         |  21 +-
 17 files changed, 1297 insertions(+), 93 deletions(-)
---
base-commit: 2014c95afecee3e76ca4a56956a936e23283f05b
change-id: 20250129-arm-brbe-v19-24d5d9e5e623

Best regards,
-- 
Rob Herring (Arm) <robh@kernel.org>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH v20 01/11] perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
@ 2025-02-18 20:39 ` Rob Herring (Arm)
  2025-02-18 20:39 ` [PATCH v20 02/11] perf: arm_pmu: Don't disable counter in armpmu_add() Rob Herring (Arm)
                   ` (11 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:39 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm

Counting events related to setup of the PMU is not desired, but
kvm_vcpu_pmu_resync_el0() is called just after the PMU counters have
been enabled. Move the call to before enabling the counters.

Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_pmuv3.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 0e360feb3432..9ebc950559c0 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -825,10 +825,10 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu)
 	else
 		armv8pmu_disable_user_access();
 
+	kvm_vcpu_pmu_resync_el0();
+
 	/* Enable all counters */
 	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
-
-	kvm_vcpu_pmu_resync_el0();
 }
 
 static void armv8pmu_stop(struct arm_pmu *cpu_pmu)

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v20 02/11] perf: arm_pmu: Don't disable counter in armpmu_add()
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
  2025-02-18 20:39 ` [PATCH v20 01/11] perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters Rob Herring (Arm)
@ 2025-02-18 20:39 ` Rob Herring (Arm)
  2025-02-18 20:39 ` [PATCH v20 03/11] perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event() Rob Herring (Arm)
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:39 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm

From: Mark Rutland <mark.rutland@arm.com>

Currently armpmu_add() tries to handle a newly-allocated counter having
a stale associated event, but this should not be possible, and if this
were to happen the current mitigation is insufficient and potentially
expensive. It would be better to warn if we encounter the impossible
case.

Calls to pmu::add() and pmu::del() are serialized by the core perf code,
and armpmu_del() clears the relevant slot in pmu_hw_events::events[]
before clearing the bit in pmu_hw_events::used_mask such that the
counter can be reallocated. Thus when armpmu_add() allocates a counter
index from pmu_hw_events::used_mask, it should not be possible to observe
a stale even in pmu_hw_events::events[] unless either
pmu_hw_events::used_mask or pmu_hw_events::events[] have been corrupted.

If this were to happen, we'd end up with two events with the same
event->hw.idx, which would clash with each other during reprogramming,
deletion, etc, and produce bogus results. Add a WARN_ON_ONCE() for this
case so that we can detect if this ever occurs in practice.

That possiblity aside, there's no need to call arm_pmu::disable(event)
for the new event. The PMU reset code initialises the counter in a
disabled state, and armpmu_del() will disable the counter before it can
be reused. Remove the redundant disable.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_pmu.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 398cce3d76fc..2f33e69a8caf 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -342,12 +342,10 @@ armpmu_add(struct perf_event *event, int flags)
 	if (idx < 0)
 		return idx;
 
-	/*
-	 * If there is an event in the counter we are going to use then make
-	 * sure it is disabled.
-	 */
+	/* The newly-allocated counter should be empty */
+	WARN_ON_ONCE(hw_events->events[idx]);
+
 	event->hw.idx = idx;
-	armpmu->disable(event);
 	hw_events->events[idx] = event;
 
 	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v20 03/11] perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event()
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
  2025-02-18 20:39 ` [PATCH v20 01/11] perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters Rob Herring (Arm)
  2025-02-18 20:39 ` [PATCH v20 02/11] perf: arm_pmu: Don't disable counter in armpmu_add() Rob Herring (Arm)
@ 2025-02-18 20:39 ` Rob Herring (Arm)
  2025-02-18 20:39 ` [PATCH v20 04/11] perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts Rob Herring (Arm)
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:39 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm

From: Mark Rutland <mark.rutland@arm.com>

Currently armv8pmu_enable_event() starts by disabling the event counter
it has been asked to enable. This should not be necessary as the counter
(and the PMU as a whole) should not be active when
armv8pmu_enable_event() is called.

Remove the redundant call to armv8pmu_disable_event_counter(). At the
same time, remove the comment immeditately above as everything it says
is obvious from the function names below.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_pmuv3.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 9ebc950559c0..5406b9ca591a 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -795,11 +795,6 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu)
 
 static void armv8pmu_enable_event(struct perf_event *event)
 {
-	/*
-	 * Enable counter and interrupt, and set the counter to count
-	 * the event that we're interested in.
-	 */
-	armv8pmu_disable_event_counter(event);
 	armv8pmu_write_event_type(event);
 	armv8pmu_enable_event_irq(event);
 	armv8pmu_enable_event_counter(event);

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v20 04/11] perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
                   ` (2 preceding siblings ...)
  2025-02-18 20:39 ` [PATCH v20 03/11] perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event() Rob Herring (Arm)
@ 2025-02-18 20:39 ` Rob Herring (Arm)
  2025-02-18 20:40 ` [PATCH v20 05/11] perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event() Rob Herring (Arm)
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:39 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm

The function calls for enabling/disabling counters and interrupts are
pretty obvious as to what they are doing, and the comments don't add
any additional value.

Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_v7_pmu.c | 44 --------------------------------------------
 1 file changed, 44 deletions(-)

diff --git a/drivers/perf/arm_v7_pmu.c b/drivers/perf/arm_v7_pmu.c
index 420cadd108e7..7fa88e3b64e0 100644
--- a/drivers/perf/arm_v7_pmu.c
+++ b/drivers/perf/arm_v7_pmu.c
@@ -857,14 +857,6 @@ static void armv7pmu_enable_event(struct perf_event *event)
 		return;
 	}
 
-	/*
-	 * Enable counter and interrupt, and set the counter to count
-	 * the event that we're interested in.
-	 */
-
-	/*
-	 * Disable counter
-	 */
 	armv7_pmnc_disable_counter(idx);
 
 	/*
@@ -875,14 +867,7 @@ static void armv7pmu_enable_event(struct perf_event *event)
 	if (cpu_pmu->set_event_filter || idx != ARMV7_IDX_CYCLE_COUNTER)
 		armv7_pmnc_write_evtsel(idx, hwc->config_base);
 
-	/*
-	 * Enable interrupt for this counter
-	 */
 	armv7_pmnc_enable_intens(idx);
-
-	/*
-	 * Enable counter
-	 */
 	armv7_pmnc_enable_counter(idx);
 }
 
@@ -898,18 +883,7 @@ static void armv7pmu_disable_event(struct perf_event *event)
 		return;
 	}
 
-	/*
-	 * Disable counter and interrupt
-	 */
-
-	/*
-	 * Disable counter
-	 */
 	armv7_pmnc_disable_counter(idx);
-
-	/*
-	 * Disable interrupt for this counter
-	 */
 	armv7_pmnc_disable_intens(idx);
 }
 
@@ -1476,12 +1450,6 @@ static void krait_pmu_enable_event(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	/*
-	 * Enable counter and interrupt, and set the counter to count
-	 * the event that we're interested in.
-	 */
-
-	/* Disable counter */
 	armv7_pmnc_disable_counter(idx);
 
 	/*
@@ -1494,10 +1462,7 @@ static void krait_pmu_enable_event(struct perf_event *event)
 	else
 		armv7_pmnc_write_evtsel(idx, hwc->config_base);
 
-	/* Enable interrupt for this counter */
 	armv7_pmnc_enable_intens(idx);
-
-	/* Enable counter */
 	armv7_pmnc_enable_counter(idx);
 }
 
@@ -1797,12 +1762,6 @@ static void scorpion_pmu_enable_event(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	/*
-	 * Enable counter and interrupt, and set the counter to count
-	 * the event that we're interested in.
-	 */
-
-	/* Disable counter */
 	armv7_pmnc_disable_counter(idx);
 
 	/*
@@ -1815,10 +1774,7 @@ static void scorpion_pmu_enable_event(struct perf_event *event)
 	else if (idx != ARMV7_IDX_CYCLE_COUNTER)
 		armv7_pmnc_write_evtsel(idx, hwc->config_base);
 
-	/* Enable interrupt for this counter */
 	armv7_pmnc_enable_intens(idx);
-
-	/* Enable counter */
 	armv7_pmnc_enable_counter(idx);
 }
 

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v20 05/11] perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event()
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
                   ` (3 preceding siblings ...)
  2025-02-18 20:39 ` [PATCH v20 04/11] perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts Rob Herring (Arm)
@ 2025-02-18 20:40 ` Rob Herring (Arm)
  2025-02-18 20:40 ` [PATCH v20 06/11] perf: apple_m1: Don't disable counter in m1_pmu_enable_event() Rob Herring (Arm)
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:40 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm

Currently (armv7|krait_|scorpion_)pmu_enable_event() start by disabling
the event counter it has been asked to enable. This should not be
necessary as the counter (and the PMU as a whole) should not be active
when *_enable_event() is called.

Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/arm_v7_pmu.c | 6 ------
 1 file changed, 6 deletions(-)

diff --git a/drivers/perf/arm_v7_pmu.c b/drivers/perf/arm_v7_pmu.c
index 7fa88e3b64e0..17831e1920bd 100644
--- a/drivers/perf/arm_v7_pmu.c
+++ b/drivers/perf/arm_v7_pmu.c
@@ -857,8 +857,6 @@ static void armv7pmu_enable_event(struct perf_event *event)
 		return;
 	}
 
-	armv7_pmnc_disable_counter(idx);
-
 	/*
 	 * Set event (if destined for PMNx counters)
 	 * We only need to set the event for the cycle counter if we
@@ -1450,8 +1448,6 @@ static void krait_pmu_enable_event(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	armv7_pmnc_disable_counter(idx);
-
 	/*
 	 * Set event (if destined for PMNx counters)
 	 * We set the event for the cycle counter because we
@@ -1762,8 +1758,6 @@ static void scorpion_pmu_enable_event(struct perf_event *event)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
-	armv7_pmnc_disable_counter(idx);
-
 	/*
 	 * Set event (if destined for PMNx counters)
 	 * We don't set the event for the cycle counter because we

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v20 06/11] perf: apple_m1: Don't disable counter in m1_pmu_enable_event()
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
                   ` (4 preceding siblings ...)
  2025-02-18 20:40 ` [PATCH v20 05/11] perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event() Rob Herring (Arm)
@ 2025-02-18 20:40 ` Rob Herring (Arm)
  2025-02-18 20:40 ` [PATCH v20 07/11] perf: arm_pmu: Move PMUv3-specific data Rob Herring (Arm)
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:40 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm

Currently m1_pmu_enable_event() starts by disabling the event counter
it has been asked to enable. This should not be necessary as the
counter (and the PMU as a whole) should not be active when
m1_pmu_enable_event() is called.

Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 drivers/perf/apple_m1_cpu_pmu.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/perf/apple_m1_cpu_pmu.c b/drivers/perf/apple_m1_cpu_pmu.c
index 06fd317529fc..39349ecec3c1 100644
--- a/drivers/perf/apple_m1_cpu_pmu.c
+++ b/drivers/perf/apple_m1_cpu_pmu.c
@@ -396,10 +396,6 @@ static void m1_pmu_enable_event(struct perf_event *event)
 	user = event->hw.config_base & M1_PMU_CFG_COUNT_USER;
 	kernel = event->hw.config_base & M1_PMU_CFG_COUNT_KERNEL;
 
-	m1_pmu_disable_counter_interrupt(event->hw.idx);
-	m1_pmu_disable_counter(event->hw.idx);
-	isb();
-
 	m1_pmu_configure_counter(event->hw.idx, evt, user, kernel);
 	m1_pmu_enable_counter(event->hw.idx);
 	m1_pmu_enable_counter_interrupt(event->hw.idx);

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v20 07/11] perf: arm_pmu: Move PMUv3-specific data
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
                   ` (5 preceding siblings ...)
  2025-02-18 20:40 ` [PATCH v20 06/11] perf: apple_m1: Don't disable counter in m1_pmu_enable_event() Rob Herring (Arm)
@ 2025-02-18 20:40 ` Rob Herring (Arm)
  2025-02-18 20:40 ` [PATCH v20 08/11] arm64/sysreg: Add BRBE registers and fields Rob Herring (Arm)
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:40 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm

From: Mark Rutland <mark.rutland@arm.com>

A few fields in struct arm_pmu are only used with PMUv3, and soon we
will need to add more for BRBE. Group the fields together so that we
have a logical place to add more data in future.

At the same time, remove the comment for reg_pmmir as it doesn't convey
anything useful.

There should be no functional change as a result of this patch.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: "Rob Herring (Arm)" <robh@kernel.org>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
---
 include/linux/perf/arm_pmu.h | 13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 4b5b83677e3f..c70d528594f2 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -84,7 +84,6 @@ struct arm_pmu {
 	struct pmu	pmu;
 	cpumask_t	supported_cpus;
 	char		*name;
-	int		pmuver;
 	irqreturn_t	(*handle_irq)(struct arm_pmu *pmu);
 	void		(*enable)(struct perf_event *event);
 	void		(*disable)(struct perf_event *event);
@@ -102,18 +101,20 @@ struct arm_pmu {
 	int		(*map_event)(struct perf_event *event);
 	DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS);
 	bool		secure_access; /* 32-bit ARM only */
-#define ARMV8_PMUV3_MAX_COMMON_EVENTS		0x40
-	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
-#define ARMV8_PMUV3_EXT_COMMON_EVENT_BASE	0x4000
-	DECLARE_BITMAP(pmceid_ext_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
 	struct platform_device	*plat_device;
 	struct pmu_hw_events	__percpu *hw_events;
 	struct hlist_node	node;
 	struct notifier_block	cpu_pm_nb;
 	/* the attr_groups array must be NULL-terminated */
 	const struct attribute_group *attr_groups[ARMPMU_NR_ATTR_GROUPS + 1];
-	/* store the PMMIR_EL1 to expose slots */
+
+	/* PMUv3 only */
+	int		pmuver;
 	u64		reg_pmmir;
+#define ARMV8_PMUV3_MAX_COMMON_EVENTS		0x40
+	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
+#define ARMV8_PMUV3_EXT_COMMON_EVENT_BASE	0x4000
+	DECLARE_BITMAP(pmceid_ext_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
 
 	/* Only to be used by ACPI probing code */
 	unsigned long acpi_cpuid;

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v20 08/11] arm64/sysreg: Add BRBE registers and fields
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
                   ` (6 preceding siblings ...)
  2025-02-18 20:40 ` [PATCH v20 07/11] perf: arm_pmu: Move PMUv3-specific data Rob Herring (Arm)
@ 2025-02-18 20:40 ` Rob Herring (Arm)
  2025-02-18 20:40 ` [PATCH v20 09/11] arm64: Handle BRBE booting requirements Rob Herring (Arm)
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:40 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm, Mark Brown

From: Anshuman Khandual <anshuman.khandual@arm.com>

This patch adds definitions related to the Branch Record Buffer Extension
(BRBE) as per ARM DDI 0487K.a. These will be used by KVM and a BRBE driver
in subsequent patches.

Some existing BRBE definitions in asm/sysreg.h are replaced with equivalent
generated definitions.

Cc: Marc Zyngier <maz@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v19:
- split BRBINF.CC field into mantissa and exponent
---
 arch/arm64/include/asm/sysreg.h |  17 ++----
 arch/arm64/tools/sysreg         | 132 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 138 insertions(+), 11 deletions(-)

diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
index 05ea5223d2d5..a8257e13f8f1 100644
--- a/arch/arm64/include/asm/sysreg.h
+++ b/arch/arm64/include/asm/sysreg.h
@@ -198,16 +198,8 @@
 #define SYS_DBGVCR32_EL2		sys_reg(2, 4, 0, 7, 0)
 
 #define SYS_BRBINF_EL1(n)		sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 0))
-#define SYS_BRBINFINJ_EL1		sys_reg(2, 1, 9, 1, 0)
 #define SYS_BRBSRC_EL1(n)		sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 1))
-#define SYS_BRBSRCINJ_EL1		sys_reg(2, 1, 9, 1, 1)
 #define SYS_BRBTGT_EL1(n)		sys_reg(2, 1, 8, (n & 15), (((n & 16) >> 2) | 2))
-#define SYS_BRBTGTINJ_EL1		sys_reg(2, 1, 9, 1, 2)
-#define SYS_BRBTS_EL1			sys_reg(2, 1, 9, 0, 2)
-
-#define SYS_BRBCR_EL1			sys_reg(2, 1, 9, 0, 0)
-#define SYS_BRBFCR_EL1			sys_reg(2, 1, 9, 0, 1)
-#define SYS_BRBIDR0_EL1			sys_reg(2, 1, 9, 2, 0)
 
 #define SYS_TRCITECR_EL1		sys_reg(3, 0, 1, 2, 3)
 #define SYS_TRCACATR(m)			sys_reg(2, 1, 2, ((m & 7) << 1), (2 | (m >> 3)))
@@ -273,8 +265,6 @@
 /* ETM */
 #define SYS_TRCOSLAR			sys_reg(2, 1, 1, 0, 4)
 
-#define SYS_BRBCR_EL2			sys_reg(2, 4, 9, 0, 0)
-
 #define SYS_MIDR_EL1			sys_reg(3, 0, 0, 0, 0)
 #define SYS_MPIDR_EL1			sys_reg(3, 0, 0, 0, 5)
 #define SYS_REVIDR_EL1			sys_reg(3, 0, 0, 0, 6)
@@ -610,7 +600,6 @@
 #define SYS_CNTHV_CVAL_EL2		sys_reg(3, 4, 14, 3, 2)
 
 /* VHE encodings for architectural EL0/1 system registers */
-#define SYS_BRBCR_EL12			sys_reg(2, 5, 9, 0, 0)
 #define SYS_SCTLR_EL12			sys_reg(3, 5, 1, 0, 0)
 #define SYS_CPACR_EL12			sys_reg(3, 5, 1, 0, 2)
 #define SYS_SCTLR2_EL12			sys_reg(3, 5, 1, 0, 3)
@@ -821,6 +810,12 @@
 #define OP_COSP_RCTX			sys_insn(1, 3, 7, 3, 6)
 #define OP_CPP_RCTX			sys_insn(1, 3, 7, 3, 7)
 
+/*
+ * BRBE Instructions
+ */
+#define BRB_IALL_INSN	__emit_inst(0xd5000000 | OP_BRB_IALL | (0x1f))
+#define BRB_INJ_INSN	__emit_inst(0xd5000000 | OP_BRB_INJ  | (0x1f))
+
 /* Common SCTLR_ELx flags. */
 #define SCTLR_ELx_ENTP2	(BIT(60))
 #define SCTLR_ELx_DSSBS	(BIT(44))
diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg
index 762ee084b37c..c0943579977a 100644
--- a/arch/arm64/tools/sysreg
+++ b/arch/arm64/tools/sysreg
@@ -1038,6 +1038,138 @@ UnsignedEnum	3:0	MTEPERM
 EndEnum
 EndSysreg
 
+
+SysregFields BRBINFx_EL1
+Res0	63:47
+Field	46	CCU
+Field	45:40	CC_EXP
+Field	39:32	CC_MANT
+Res0	31:18
+Field	17	LASTFAILED
+Field	16	T
+Res0	15:14
+Enum	13:8		TYPE
+	0b000000	DIRECT_UNCOND
+	0b000001	INDIRECT
+	0b000010	DIRECT_LINK
+	0b000011	INDIRECT_LINK
+	0b000101	RET
+	0b000111	ERET
+	0b001000	DIRECT_COND
+	0b100001	DEBUG_HALT
+	0b100010	CALL
+	0b100011	TRAP
+	0b100100	SERROR
+	0b100110	INSN_DEBUG
+	0b100111	DATA_DEBUG
+	0b101010	ALIGN_FAULT
+	0b101011	INSN_FAULT
+	0b101100	DATA_FAULT
+	0b101110	IRQ
+	0b101111	FIQ
+	0b110000	IMPDEF_TRAP_EL3
+	0b111001	DEBUG_EXIT
+EndEnum
+Enum	7:6	EL
+	0b00	EL0
+	0b01	EL1
+	0b10	EL2
+	0b11	EL3
+EndEnum
+Field	5	MPRED
+Res0	4:2
+Enum	1:0	VALID
+	0b00	NONE
+	0b01	TARGET
+	0b10	SOURCE
+	0b11	FULL
+EndEnum
+EndSysregFields
+
+SysregFields	BRBCR_ELx
+Res0	63:24
+Field	23 	EXCEPTION
+Field	22 	ERTN
+Res0	21:10
+Field	9	FZPSS
+Field	8 	FZP
+Res0	7
+Enum	6:5	TS
+	0b01	VIRTUAL
+	0b10	GUEST_PHYSICAL
+	0b11	PHYSICAL
+EndEnum
+Field	4	MPRED
+Field	3	CC
+Res0	2
+Field	1	ExBRE
+Field	0	E0BRE
+EndSysregFields
+
+Sysreg	BRBCR_EL1	2	1	9	0	0
+Fields	BRBCR_ELx
+EndSysreg
+
+Sysreg	BRBFCR_EL1	2	1	9	0	1
+Res0	63:30
+Enum	29:28	BANK
+	0b00	BANK_0
+	0b01	BANK_1
+EndEnum
+Res0	27:23
+Field	22	CONDDIR
+Field	21	DIRCALL
+Field	20	INDCALL
+Field	19	RTN
+Field	18	INDIRECT
+Field	17	DIRECT
+Field	16	EnI
+Res0	15:8
+Field	7	PAUSED
+Field	6	LASTFAILED
+Res0	5:0
+EndSysreg
+
+Sysreg	BRBTS_EL1	2	1	9	0	2
+Field	63:0	TS
+EndSysreg
+
+Sysreg	BRBINFINJ_EL1	2	1	9	1	0
+Fields BRBINFx_EL1
+EndSysreg
+
+Sysreg	BRBSRCINJ_EL1	2	1	9	1	1
+Field	63:0 ADDRESS
+EndSysreg
+
+Sysreg	BRBTGTINJ_EL1	2	1	9	1	2
+Field	63:0 ADDRESS
+EndSysreg
+
+Sysreg	BRBIDR0_EL1	2	1	9	2	0
+Res0	63:16
+Enum	15:12	CC
+	0b0101	20_BIT
+EndEnum
+Enum	11:8	FORMAT
+	0b0000	FORMAT_0
+EndEnum
+Enum	7:0		NUMREC
+	0b00001000	8
+	0b00010000	16
+	0b00100000	32
+	0b01000000	64
+EndEnum
+EndSysreg
+
+Sysreg	BRBCR_EL2	2	4	9	0	0
+Fields	BRBCR_ELx
+EndSysreg
+
+Sysreg	BRBCR_EL12	2	5	9	0	0
+Fields	BRBCR_ELx
+EndSysreg
+
 Sysreg	ID_AA64ZFR0_EL1	3	0	0	4	4
 Res0	63:60
 UnsignedEnum	59:56	F64MM

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v20 09/11] arm64: Handle BRBE booting requirements
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
                   ` (7 preceding siblings ...)
  2025-02-18 20:40 ` [PATCH v20 08/11] arm64/sysreg: Add BRBE registers and fields Rob Herring (Arm)
@ 2025-02-18 20:40 ` Rob Herring (Arm)
  2025-02-18 20:40 ` [PATCH v20 10/11] KVM: arm64: nvhe: Disable branch generation in nVHE guests Rob Herring (Arm)
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:40 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm

From: Anshuman Khandual <anshuman.khandual@arm.com>

To use the Branch Record Buffer Extension (BRBE), some configuration is
necessary at EL3 and EL2. This patch documents the requirements and adds
the initial EL2 setup code, which largely consists of configuring the
fine-grained traps and initializing a couple of BRBE control registers.

Before this patch, __init_el2_fgt() would initialize HDFGRTR_EL2 and
HDFGWTR_EL2 with the same value, relying on the read/write trap controls
for a register occupying the same bit position in either register. The
'nBRBIDR' trap control only exists in bit 59 of HDFGRTR_EL2, while bit
59 of HDFGWTR_EL2 is RES0, and so this assumption no longer holds.

To handle HDFGRTR_EL2 and HDFGWTR_EL2 having (slightly) different bit
layouts, __init_el2_fgt() is changed to accumulate the HDFGRTR_EL2 and
HDFGWTR_EL2 control bits separately. While making this change the
open-coded value (1 << 62) is replaced with
HDFG{R,W}TR_EL2_nPMSNEVFR_EL1_MASK.

The BRBCR_EL1 and BRBCR_EL2 registers are unusual and require special
initialisation: even though they are subject to E2H renaming, both have
an effect regardless of HCR_EL2.TGE, even when running at EL2, and
consequently both need to be initialised. This is handled in
__init_el2_brbe() with a comment to explain the situation.

Cc: Marc Zyngier <maz@kernel.org>
Cc: Oliver Upton <oliver.upton@linux.dev>
Reviewed-by: Leo Yan <leo.yan@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
[Mark: rewrite commit message, fix typo in comment]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v20:
 - Document that MDCR_EL3.SBRBE can be 0b01 also
 - Fix "HDFGWTR_EL2 is RES0" in commit msg
---
 Documentation/arch/arm64/booting.rst | 21 +++++++++
 arch/arm64/include/asm/el2_setup.h   | 86 ++++++++++++++++++++++++++++++++++--
 2 files changed, 104 insertions(+), 3 deletions(-)

diff --git a/Documentation/arch/arm64/booting.rst b/Documentation/arch/arm64/booting.rst
index cad6fdc96b98..a7140e534d35 100644
--- a/Documentation/arch/arm64/booting.rst
+++ b/Documentation/arch/arm64/booting.rst
@@ -352,6 +352,27 @@ Before jumping into the kernel, the following conditions must be met:
 
     - HWFGWTR_EL2.nSMPRI_EL1 (bit 54) must be initialised to 0b01.
 
+  For CPUs with feature Branch Record Buffer Extension (FEAT_BRBE):
+
+  - If EL3 is present:
+
+    - MDCR_EL3.SBRBE (bits 33:32) must be initialised to 0b01 or 0b11.
+
+  - If the kernel is entered at EL1 and EL2 is present:
+
+    - BRBCR_EL2.CC (bit 3) must be initialised to 0b1.
+    - BRBCR_EL2.MPRED (bit 4) must be initialised to 0b1.
+
+    - HDFGRTR_EL2.nBRBDATA (bit 61) must be initialised to 0b1.
+    - HDFGRTR_EL2.nBRBCTL  (bit 60) must be initialised to 0b1.
+    - HDFGRTR_EL2.nBRBIDR  (bit 59) must be initialised to 0b1.
+
+    - HDFGWTR_EL2.nBRBDATA (bit 61) must be initialised to 0b1.
+    - HDFGWTR_EL2.nBRBCTL  (bit 60) must be initialised to 0b1.
+
+    - HFGITR_EL2.nBRBIALL (bit 56) must be initialised to 0b1.
+    - HFGITR_EL2.nBRBINJ  (bit 55) must be initialised to 0b1.
+
   For CPUs with the Scalable Matrix Extension FA64 feature (FEAT_SME_FA64):
 
   - If EL3 is present:
diff --git a/arch/arm64/include/asm/el2_setup.h b/arch/arm64/include/asm/el2_setup.h
index 25e162651750..bf21ce513aff 100644
--- a/arch/arm64/include/asm/el2_setup.h
+++ b/arch/arm64/include/asm/el2_setup.h
@@ -163,6 +163,39 @@
 .Lskip_set_cptr_\@:
 .endm
 
+/*
+ * Configure BRBE to permit recording cycle counts and branch mispredicts.
+ *
+ * At any EL, to record cycle counts BRBE requires that both BRBCR_EL2.CC=1 and
+ * BRBCR_EL1.CC=1.
+ *
+ * At any EL, to record branch mispredicts BRBE requires that both
+ * BRBCR_EL2.MPRED=1 and BRBCR_EL1.MPRED=1.
+ *
+ * When HCR_EL2.E2H=1, the BRBCR_EL1 encoding is redirected to BRBCR_EL2, but
+ * the {CC,MPRED} bits in the real BRBCR_EL1 register still apply.
+ *
+ * Set {CC,MPRED} in both BRBCR_EL2 and BRBCR_EL1 so that at runtime we only
+ * need to enable/disable these in BRBCR_EL1 regardless of whether the kernel
+ * ends up executing in EL1 or EL2.
+ */
+.macro __init_el2_brbe
+	mrs	x1, id_aa64dfr0_el1
+	ubfx	x1, x1, #ID_AA64DFR0_EL1_BRBE_SHIFT, #4
+	cbz	x1, .Lskip_brbe_\@
+
+	mov_q	x0, BRBCR_ELx_CC | BRBCR_ELx_MPRED
+	msr_s	SYS_BRBCR_EL2, x0
+
+	__check_hvhe .Lset_brbe_nvhe_\@, x1
+	msr_s	SYS_BRBCR_EL12, x0	// VHE
+	b	.Lskip_brbe_\@
+
+.Lset_brbe_nvhe_\@:
+	msr_s	SYS_BRBCR_EL1, x0	// NVHE
+.Lskip_brbe_\@:
+.endm
+
 /* Disable any fine grained traps */
 .macro __init_el2_fgt
 	mrs	x1, id_aa64mmfr0_el1
@@ -170,16 +203,48 @@
 	cbz	x1, .Lskip_fgt_\@
 
 	mov	x0, xzr
+	mov	x2, xzr
 	mrs	x1, id_aa64dfr0_el1
 	ubfx	x1, x1, #ID_AA64DFR0_EL1_PMSVer_SHIFT, #4
 	cmp	x1, #3
 	b.lt	.Lskip_spe_fgt_\@
+
 	/* Disable PMSNEVFR_EL1 read and write traps */
-	orr	x0, x0, #(1 << 62)
+	orr	x0, x0, #HDFGRTR_EL2_nPMSNEVFR_EL1_MASK
+	orr	x2, x2, #HDFGWTR_EL2_nPMSNEVFR_EL1_MASK
 
 .Lskip_spe_fgt_\@:
+#ifdef CONFIG_ARM64_BRBE
+	mrs	x1, id_aa64dfr0_el1
+	ubfx	x1, x1, #ID_AA64DFR0_EL1_BRBE_SHIFT, #4
+	cbz	x1, .Lskip_brbe_reg_fgt_\@
+
+	/*
+	 * Disable read traps for the following registers
+	 *
+	 * [BRBSRC|BRBTGT|RBINF]_EL1
+	 * [BRBSRCINJ|BRBTGTINJ|BRBINFINJ|BRBTS]_EL1
+	 */
+	orr	x0, x0, #HDFGRTR_EL2_nBRBDATA_MASK
+
+	/*
+	 * Disable write traps for the following registers
+	 *
+	 * [BRBSRCINJ|BRBTGTINJ|BRBINFINJ|BRBTS]_EL1
+	 */
+	orr	x2, x2, #HDFGWTR_EL2_nBRBDATA_MASK
+
+	/* Disable read and write traps for [BRBCR|BRBFCR]_EL1 */
+	orr	x0, x0, #HDFGRTR_EL2_nBRBCTL_MASK
+	orr	x2, x2, #HDFGWTR_EL2_nBRBCTL_MASK
+
+	/* Disable read traps for BRBIDR_EL1 */
+	orr	x0, x0, #HDFGRTR_EL2_nBRBIDR_MASK
+
+.Lskip_brbe_reg_fgt_\@:
+#endif /* CONFIG_ARM64_BRBE */
 	msr_s	SYS_HDFGRTR_EL2, x0
-	msr_s	SYS_HDFGWTR_EL2, x0
+	msr_s	SYS_HDFGWTR_EL2, x2
 
 	mov	x0, xzr
 	mrs	x1, id_aa64pfr1_el1
@@ -220,7 +285,21 @@
 .Lset_fgt_\@:
 	msr_s	SYS_HFGRTR_EL2, x0
 	msr_s	SYS_HFGWTR_EL2, x0
-	msr_s	SYS_HFGITR_EL2, xzr
+	mov	x0, xzr
+#ifdef CONFIG_ARM64_BRBE
+	mrs	x1, id_aa64dfr0_el1
+	ubfx	x1, x1, #ID_AA64DFR0_EL1_BRBE_SHIFT, #4
+	cbz	x1, .Lskip_brbe_insn_fgt_\@
+
+	/* Disable traps for BRBIALL instruction */
+	orr	x0, x0, #HFGITR_EL2_nBRBIALL_MASK
+
+	/* Disable traps for BRBINJ instruction */
+	orr	x0, x0, #HFGITR_EL2_nBRBINJ_MASK
+
+.Lskip_brbe_insn_fgt_\@:
+#endif /* CONFIG_ARM64_BRBE */
+	msr_s	SYS_HFGITR_EL2, x0
 
 	mrs	x1, id_aa64pfr0_el1		// AMU traps UNDEF without AMU
 	ubfx	x1, x1, #ID_AA64PFR0_EL1_AMU_SHIFT, #4
@@ -275,6 +354,7 @@
 	__init_el2_hcrx
 	__init_el2_timers
 	__init_el2_debug
+	__init_el2_brbe
 	__init_el2_lor
 	__init_el2_stage2
 	__init_el2_gicv3

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v20 10/11] KVM: arm64: nvhe: Disable branch generation in nVHE guests
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
                   ` (8 preceding siblings ...)
  2025-02-18 20:40 ` [PATCH v20 09/11] arm64: Handle BRBE booting requirements Rob Herring (Arm)
@ 2025-02-18 20:40 ` Rob Herring (Arm)
  2025-02-24 10:41   ` Leo Yan
  2025-02-18 20:40 ` [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE) Rob Herring (Arm)
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:40 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm

From: Anshuman Khandual <anshuman.khandual@arm.com>

While BRBE can record branches within guests, the host recording
branches in guests is not supported by perf (though events are).
Support for BRBE in guests will supported by providing direct access
to BRBE within the guests. That is how x86 LBR works for guests.
Therefore, BRBE needs to be disabled on guest entry and restored on
exit.

For nVHE, this requires explicit handling for guests. Before
entering a guest, save the BRBE state and disable the it. When
returning to the host, restore the state.

For VHE, it is not necessary. We initialize
BRBCR_EL1.{E1BRE,E0BRE}=={0,0} at boot time, and HCR_EL2.TGE==1 while
running in the host. We configure BRBCR_EL2.{E2BRE,E0HBRE} to enable
branch recording in the host. When entering the guest, we set
HCR_EL2.TGE==0 which means BRBCR_EL1 is used instead of BRBCR_EL2.
Consequently for VHE, BRBE recording is disabled at EL1 and EL0 when
running a guest.

Should recording in guests (by the host) ever be desired, the perf ABI
will need to be extended to distinguish guest addresses (struct
perf_branch_entry.priv) for starters. BRBE records would also need to be
invalidated on guest entry/exit as guest/host EL1 and EL0 records can't
be distinguished.

Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Co-developed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v20:
 - Reword commit message about no guest recording.
 - Add BRBE to __kvm_vcpu_run() synchronization comment

v19:
 - Rework due to v6.14 debug flag changes
 - Redo commit message
---
 arch/arm64/include/asm/kvm_host.h  |  2 ++
 arch/arm64/kvm/debug.c             |  4 ++++
 arch/arm64/kvm/hyp/nvhe/debug-sr.c | 32 ++++++++++++++++++++++++++++++++
 arch/arm64/kvm/hyp/nvhe/switch.c   |  2 +-
 4 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 7cfa024de4e3..4fc246a1ee6b 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -619,6 +619,7 @@ struct kvm_host_data {
 #define KVM_HOST_DATA_FLAG_HOST_SME_ENABLED		3
 #define KVM_HOST_DATA_FLAG_TRBE_ENABLED			4
 #define KVM_HOST_DATA_FLAG_EL1_TRACING_CONFIGURED	5
+#define KVM_HOST_DATA_FLAG_HAS_BRBE			6
 	unsigned long flags;
 
 	struct kvm_cpu_context host_ctxt;
@@ -662,6 +663,7 @@ struct kvm_host_data {
 		u64 trfcr_el1;
 		/* Values of trap registers for the host before guest entry. */
 		u64 mdcr_el2;
+		u64 brbcr_el1;
 	} host_debug_state;
 
 	/* Guest trace filter value */
diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
index 0e4c805e7e89..bc6015108a68 100644
--- a/arch/arm64/kvm/debug.c
+++ b/arch/arm64/kvm/debug.c
@@ -81,6 +81,10 @@ void kvm_init_host_debug_data(void)
 	    !(read_sysreg_s(SYS_PMBIDR_EL1) & PMBIDR_EL1_P))
 		host_data_set_flag(HAS_SPE);
 
+	/* Check if we have BRBE implemented and available at the host */
+	if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT))
+		host_data_set_flag(HAS_BRBE);
+
 	if (cpuid_feature_extract_unsigned_field(dfr0, ID_AA64DFR0_EL1_TraceFilt_SHIFT)) {
 		/* Force disable trace in protected mode in case of no TRBE */
 		if (is_protected_kvm_enabled())
diff --git a/arch/arm64/kvm/hyp/nvhe/debug-sr.c b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
index 2f4a4f5036bb..2a1c0f49792b 100644
--- a/arch/arm64/kvm/hyp/nvhe/debug-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/debug-sr.c
@@ -92,12 +92,42 @@ static void __trace_switch_to_host(void)
 			  *host_data_ptr(host_debug_state.trfcr_el1));
 }
 
+static void __debug_save_brbe(u64 *brbcr_el1)
+{
+	*brbcr_el1 = 0;
+
+	/* Check if the BRBE is enabled */
+	if (!(read_sysreg_el1(SYS_BRBCR) & (BRBCR_ELx_E0BRE | BRBCR_ELx_ExBRE)))
+		return;
+
+	/*
+	 * Prohibit branch record generation while we are in guest.
+	 * Since access to BRBCR_EL1 is trapped, the guest can't
+	 * modify the filtering set by the host.
+	 */
+	*brbcr_el1 = read_sysreg_el1(SYS_BRBCR);
+	write_sysreg_el1(0, SYS_BRBCR);
+}
+
+static void __debug_restore_brbe(u64 brbcr_el1)
+{
+	if (!brbcr_el1)
+		return;
+
+	/* Restore BRBE controls */
+	write_sysreg_el1(brbcr_el1, SYS_BRBCR);
+}
+
 void __debug_save_host_buffers_nvhe(struct kvm_vcpu *vcpu)
 {
 	/* Disable and flush SPE data generation */
 	if (host_data_test_flag(HAS_SPE))
 		__debug_save_spe(host_data_ptr(host_debug_state.pmscr_el1));
 
+	/* Disable BRBE branch records */
+	if (host_data_test_flag(HAS_BRBE))
+		__debug_save_brbe(host_data_ptr(host_debug_state.brbcr_el1));
+
 	if (__trace_needs_switch())
 		__trace_switch_to_guest();
 }
@@ -111,6 +141,8 @@ void __debug_restore_host_buffers_nvhe(struct kvm_vcpu *vcpu)
 {
 	if (host_data_test_flag(HAS_SPE))
 		__debug_restore_spe(*host_data_ptr(host_debug_state.pmscr_el1));
+	if (host_data_test_flag(HAS_BRBE))
+		__debug_restore_brbe(*host_data_ptr(host_debug_state.brbcr_el1));
 	if (__trace_needs_switch())
 		__trace_switch_to_host();
 }
diff --git a/arch/arm64/kvm/hyp/nvhe/switch.c b/arch/arm64/kvm/hyp/nvhe/switch.c
index 6c846d033d24..5186f9504842 100644
--- a/arch/arm64/kvm/hyp/nvhe/switch.c
+++ b/arch/arm64/kvm/hyp/nvhe/switch.c
@@ -318,7 +318,7 @@ int __kvm_vcpu_run(struct kvm_vcpu *vcpu)
 	 * We're about to restore some new MMU state. Make sure
 	 * ongoing page-table walks that have started before we
 	 * trapped to EL2 have completed. This also synchronises the
-	 * above disabling of SPE and TRBE.
+	 * above disabling of BRBE, SPE and TRBE.
 	 *
 	 * See DDI0487I.a D8.1.5 "Out-of-context translation regimes",
 	 * rule R_LFHQG and subsequent information statements.

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
                   ` (9 preceding siblings ...)
  2025-02-18 20:40 ` [PATCH v20 10/11] KVM: arm64: nvhe: Disable branch generation in nVHE guests Rob Herring (Arm)
@ 2025-02-18 20:40 ` Rob Herring (Arm)
  2025-02-24 12:25   ` Leo Yan
  2025-02-19 16:09 ` [PATCH v20 00/11] arm64/perf: Enable branch stack sampling James Clark
  2025-03-01  7:05 ` Will Deacon
  12 siblings, 1 reply; 33+ messages in thread
From: Rob Herring (Arm) @ 2025-02-18 20:40 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, Leo Yan
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm

From: Anshuman Khandual <anshuman.khandual@arm.com>

The ARMv9.2 architecture introduces the optional Branch Record Buffer
Extension (BRBE), which records information about branches as they are
executed into set of branch record registers. BRBE is similar to x86's
Last Branch Record (LBR) and PowerPC's Branch History Rolling Buffer
(BHRB).

BRBE supports filtering by exception level and can filter just the
source or target address if excluded to avoid leaking privileged
addresses. The h/w filter would be sufficient except when there are
multiple events with disjoint filtering requirements. In this case, BRBE
is configured with a union of all the events' desired branches, and then
the recorded branches are filtered based on each event's filter. For
example, with one event capturing kernel events and another event
capturing user events, BRBE will be configured to capture both kernel
and user branches. When handling event overflow, the branch records have
to be filtered by software to only include kernel or user branch
addresses for that event. In contrast, x86 simply configures LBR using
the last installed event which seems broken.

It is possible on x86 to configure branch filter such that no branches
are ever recorded (e.g. -j save_type). For BRBE, events with a
configuration that will result in no samples are rejected.

Recording branches in KVM guests is not supported like x86. However,
perf on x86 allows requesting branch recording in guests. The guest
events are recorded, but the resulting branches are all from the host.
For BRBE, events with branch recording and "exclude_host" set are
rejected. Requiring "exclude_guest" to be set did not work. The default
for the perf tool does set "exclude_guest" if no exception level
options are specified. However, specifying kernel or user events
defaults to including both host and guest. In this case, only host
branches are recorded.

BRBE can support some additional exception branch types compared to
x86. On x86, all exceptions other than syscalls are recorded as IRQ.
With BRBE, it is possible to better categorize these exceptions. One
limitation relative to x86 is we cannot distinguish a syscall return
from other exception returns. So all exception returns are recorded as
ERET type. The FIQ branch type is omitted as the only FIQ user is Apple
platforms which don't support BRBE. The debug branch types are omitted
as there is no clear need for them.

BRBE records are invalidated whenever events are reconfigured, a new
task is scheduled in, or after recording is paused (and the records
have been recorded for the event). The architecture allows branch
records to be invalidated by the PE under implementation defined
conditions. It is expected that these conditions are rare.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Co-developed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Co-developed-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
---
v20:
 - Fix sparse percpu warning in branch_records_alloc()
 - Use pr_debug instead of pr_debug_once. Add debug print on all error
   cases when event.attr is rejected.
 - Add back some arm64 specific exception types
 - Convert 'call' from user to kernel to syscall as appropriate
 - Drop requiring event and branch privileges to match
 - Only enable exceptions and exception returns if recording kernel
   branches
 - Add "branches" caps sysfs attribute like x86
 - Drop some unused defines
 - Use u64 instead of unsigned long for branch record fields
 - use min() in brbinf_get_cycles()
 - Reword comment about FZP and MDCR_EL2.HPMN interaction
 - Add comments on assumptions about calling brbe_enable()
 - Merge capture_brbe_flags() into perf_entry_from_brbe_regset()
 - Rework BRBE invalidation to avoid invalidating in interrupt handler
   when no handled events capture the branch stack.
 - Also clear BRBCR_ELx in brbe_disable(). This is for KVM nVHE checks
   if BRBE is enabled.

v19:
- Drop saving of branch records when task scheduled out. (Mark)
- Got rid of added armpmu ops. All BRBE support contained within pmuv3
  code.
- Dropped armpmu.num_branch_records as reg_brbidr has same info.
- Make sched_task() callback actually get called. Enabling requires a
  call to perf_sched_cb_inc().
- Fix freeze on overflow for VHE
- The cycle counter doesn't freeze BRBE on overflow, so avoid assigning
  it when BRBE is enabled.
- Drop all the Arm specific exception branches. Not a clear need for
  them.
- Simplify enable/disable to avoid RMW and document ISBs needed
- Fix handling of branch 'cycles' reading. CC field is
  mantissa/exponent, not an integer.
- Save BRBFCR and BRBCR settings in event->hw.branch_reg.config and
  event->hw.extra_reg.config to avoid recalculating the register value
  each time the event is installed.
- Rework s/w filtering to better match h/w filtering
- Reject events with disjoint event filter and branch filter
- Reject events if exclude_host is set

v18: https://lore.kernel.org/all/20240613061731.3109448-6-anshuman.khandual@arm.com/
---
 drivers/perf/Kconfig         |  11 +
 drivers/perf/Makefile        |   1 +
 drivers/perf/arm_brbe.c      | 802 +++++++++++++++++++++++++++++++++++++++++++
 drivers/perf/arm_brbe.h      |  47 +++
 drivers/perf/arm_pmu.c       |  15 +-
 drivers/perf/arm_pmuv3.c     | 128 ++++++-
 include/linux/perf/arm_pmu.h |   8 +
 7 files changed, 1005 insertions(+), 7 deletions(-)

diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 4e268de351c4..3be60ff4236d 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -223,6 +223,17 @@ config ARM_SPE_PMU
 	  Extension, which provides periodic sampling of operations in
 	  the CPU pipeline and reports this via the perf AUX interface.
 
+config ARM64_BRBE
+	bool "Enable support for branch stack sampling using FEAT_BRBE"
+	depends on ARM_PMUV3 && ARM64
+	default y
+	help
+	  Enable perf support for Branch Record Buffer Extension (BRBE) which
+	  records all branches taken in an execution path. This supports some
+	  branch types and privilege based filtering. It captures additional
+	  relevant information such as cycle count, misprediction and branch
+	  type, branch privilege level etc.
+
 config ARM_DMC620_PMU
 	tristate "Enable PMU support for the ARM DMC-620 memory controller"
 	depends on (ARM64 && ACPI) || COMPILE_TEST
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index de71d2574857..192fc8b16204 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -23,6 +23,7 @@ obj-$(CONFIG_STARFIVE_STARLINK_PMU) += starfive_starlink_pmu.o
 obj-$(CONFIG_THUNDERX2_PMU) += thunderx2_pmu.o
 obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
 obj-$(CONFIG_ARM_SPE_PMU) += arm_spe_pmu.o
+obj-$(CONFIG_ARM64_BRBE) += arm_brbe.o
 obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
diff --git a/drivers/perf/arm_brbe.c b/drivers/perf/arm_brbe.c
new file mode 100644
index 000000000000..2f254bd40af3
--- /dev/null
+++ b/drivers/perf/arm_brbe.c
@@ -0,0 +1,802 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Branch Record Buffer Extension Driver.
+ *
+ * Copyright (C) 2022-2025 ARM Limited
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+#include <linux/types.h>
+#include <linux/bitmap.h>
+#include <linux/perf/arm_pmu.h>
+#include "arm_brbe.h"
+
+#define BRBFCR_EL1_BRANCH_FILTERS (BRBFCR_EL1_DIRECT   | \
+				   BRBFCR_EL1_INDIRECT | \
+				   BRBFCR_EL1_RTN      | \
+				   BRBFCR_EL1_INDCALL  | \
+				   BRBFCR_EL1_DIRCALL  | \
+				   BRBFCR_EL1_CONDDIR)
+
+/*
+ * BRBTS_EL1 is currently not used for branch stack implementation
+ * purpose but BRBCR_ELx.TS needs to have a valid value from all
+ * available options. BRBCR_ELx_TS_VIRTUAL is selected for this.
+ */
+#define BRBCR_ELx_DEFAULT_TS      FIELD_PREP(BRBCR_ELx_TS_MASK, BRBCR_ELx_TS_VIRTUAL)
+
+/*
+ * BRBE Buffer Organization
+ *
+ * BRBE buffer is arranged as multiple banks of 32 branch record
+ * entries each. An individual branch record in a given bank could
+ * be accessed, after selecting the bank in BRBFCR_EL1.BANK and
+ * accessing the registers i.e [BRBSRC, BRBTGT, BRBINF] set with
+ * indices [0..31].
+ *
+ * Bank 0
+ *
+ *	---------------------------------	------
+ *	| 00 | BRBSRC | BRBTGT | BRBINF |	| 00 |
+ *	---------------------------------	------
+ *	| 01 | BRBSRC | BRBTGT | BRBINF |	| 01 |
+ *	---------------------------------	------
+ *	| .. | BRBSRC | BRBTGT | BRBINF |	| .. |
+ *	---------------------------------	------
+ *	| 31 | BRBSRC | BRBTGT | BRBINF |	| 31 |
+ *	---------------------------------	------
+ *
+ * Bank 1
+ *
+ *	---------------------------------	------
+ *	| 32 | BRBSRC | BRBTGT | BRBINF |	| 00 |
+ *	---------------------------------	------
+ *	| 33 | BRBSRC | BRBTGT | BRBINF |	| 01 |
+ *	---------------------------------	------
+ *	| .. | BRBSRC | BRBTGT | BRBINF |	| .. |
+ *	---------------------------------	------
+ *	| 63 | BRBSRC | BRBTGT | BRBINF |	| 31 |
+ *	---------------------------------	------
+ */
+#define BRBE_BANK_MAX_ENTRIES	32
+
+struct brbe_regset {
+	u64 brbsrc;
+	u64 brbtgt;
+	u64 brbinf;
+};
+
+#define PERF_BR_ARM64_MAX (PERF_BR_MAX + PERF_BR_NEW_MAX)
+
+struct brbe_hw_attr {
+	int	brbe_version;
+	int	brbe_cc;
+	int	brbe_nr;
+	int	brbe_format;
+};
+
+#define BRBE_REGN_CASE(n, case_macro) \
+	case n: case_macro(n); break
+
+#define BRBE_REGN_SWITCH(x, case_macro)				\
+	do {							\
+		switch (x) {					\
+		BRBE_REGN_CASE(0, case_macro);			\
+		BRBE_REGN_CASE(1, case_macro);			\
+		BRBE_REGN_CASE(2, case_macro);			\
+		BRBE_REGN_CASE(3, case_macro);			\
+		BRBE_REGN_CASE(4, case_macro);			\
+		BRBE_REGN_CASE(5, case_macro);			\
+		BRBE_REGN_CASE(6, case_macro);			\
+		BRBE_REGN_CASE(7, case_macro);			\
+		BRBE_REGN_CASE(8, case_macro);			\
+		BRBE_REGN_CASE(9, case_macro);			\
+		BRBE_REGN_CASE(10, case_macro);			\
+		BRBE_REGN_CASE(11, case_macro);			\
+		BRBE_REGN_CASE(12, case_macro);			\
+		BRBE_REGN_CASE(13, case_macro);			\
+		BRBE_REGN_CASE(14, case_macro);			\
+		BRBE_REGN_CASE(15, case_macro);			\
+		BRBE_REGN_CASE(16, case_macro);			\
+		BRBE_REGN_CASE(17, case_macro);			\
+		BRBE_REGN_CASE(18, case_macro);			\
+		BRBE_REGN_CASE(19, case_macro);			\
+		BRBE_REGN_CASE(20, case_macro);			\
+		BRBE_REGN_CASE(21, case_macro);			\
+		BRBE_REGN_CASE(22, case_macro);			\
+		BRBE_REGN_CASE(23, case_macro);			\
+		BRBE_REGN_CASE(24, case_macro);			\
+		BRBE_REGN_CASE(25, case_macro);			\
+		BRBE_REGN_CASE(26, case_macro);			\
+		BRBE_REGN_CASE(27, case_macro);			\
+		BRBE_REGN_CASE(28, case_macro);			\
+		BRBE_REGN_CASE(29, case_macro);			\
+		BRBE_REGN_CASE(30, case_macro);			\
+		BRBE_REGN_CASE(31, case_macro);			\
+		default: WARN(1, "Invalid BRB* index %d\n", x);	\
+		}						\
+	} while (0)
+
+#define RETURN_READ_BRBSRCN(n) \
+	return read_sysreg_s(SYS_BRBSRC_EL1(n))
+static inline u64 get_brbsrc_reg(int idx)
+{
+	BRBE_REGN_SWITCH(idx, RETURN_READ_BRBSRCN);
+	return 0;
+}
+
+#define RETURN_READ_BRBTGTN(n) \
+	return read_sysreg_s(SYS_BRBTGT_EL1(n))
+static u64 get_brbtgt_reg(int idx)
+{
+	BRBE_REGN_SWITCH(idx, RETURN_READ_BRBTGTN);
+	return 0;
+}
+
+#define RETURN_READ_BRBINFN(n) \
+	return read_sysreg_s(SYS_BRBINF_EL1(n))
+static u64 get_brbinf_reg(int idx)
+{
+	BRBE_REGN_SWITCH(idx, RETURN_READ_BRBINFN);
+	return 0;
+}
+
+static u64 brbe_record_valid(u64 brbinf)
+{
+	return FIELD_GET(BRBINFx_EL1_VALID_MASK, brbinf);
+}
+
+static bool brbe_invalid(u64 brbinf)
+{
+	return brbe_record_valid(brbinf) == BRBINFx_EL1_VALID_NONE;
+}
+
+static bool brbe_record_is_complete(u64 brbinf)
+{
+	return brbe_record_valid(brbinf) == BRBINFx_EL1_VALID_FULL;
+}
+
+static bool brbe_record_is_source_only(u64 brbinf)
+{
+	return brbe_record_valid(brbinf) == BRBINFx_EL1_VALID_SOURCE;
+}
+
+static bool brbe_record_is_target_only(u64 brbinf)
+{
+	return brbe_record_valid(brbinf) == BRBINFx_EL1_VALID_TARGET;
+}
+
+static int brbinf_get_in_tx(u64 brbinf)
+{
+	return FIELD_GET(BRBINFx_EL1_T_MASK, brbinf);
+}
+
+static int brbinf_get_mispredict(u64 brbinf)
+{
+	return FIELD_GET(BRBINFx_EL1_MPRED_MASK, brbinf);
+}
+
+static int brbinf_get_lastfailed(u64 brbinf)
+{
+	return FIELD_GET(BRBINFx_EL1_LASTFAILED_MASK, brbinf);
+}
+
+static u16 brbinf_get_cycles(u64 brbinf)
+{
+	u32 exp, mant, cycles;
+	/*
+	 * Captured cycle count is unknown and hence
+	 * should not be passed on to userspace.
+	 */
+	if (brbinf & BRBINFx_EL1_CCU)
+		return 0;
+
+	exp = FIELD_GET(BRBINFx_EL1_CC_EXP_MASK, brbinf);
+	mant = FIELD_GET(BRBINFx_EL1_CC_MANT_MASK, brbinf);
+
+	if (!exp)
+		return mant;
+
+	cycles = (mant | 0x100) << (exp - 1);
+
+	return min(cycles, U16_MAX);
+}
+
+static int brbinf_get_type(u64 brbinf)
+{
+	return FIELD_GET(BRBINFx_EL1_TYPE_MASK, brbinf);
+}
+
+static int brbinf_get_el(u64 brbinf)
+{
+	return FIELD_GET(BRBINFx_EL1_EL_MASK, brbinf);
+}
+
+void brbe_invalidate(void)
+{
+	// Ensure all branches before this point are recorded
+	isb();
+	asm volatile(BRB_IALL_INSN);
+	// Ensure all branch records are invalidated after this point
+	isb();
+}
+
+static bool valid_brbe_nr(int brbe_nr)
+{
+	return brbe_nr == BRBIDR0_EL1_NUMREC_8 ||
+	       brbe_nr == BRBIDR0_EL1_NUMREC_16 ||
+	       brbe_nr == BRBIDR0_EL1_NUMREC_32 ||
+	       brbe_nr == BRBIDR0_EL1_NUMREC_64;
+}
+
+static bool valid_brbe_cc(int brbe_cc)
+{
+	return brbe_cc == BRBIDR0_EL1_CC_20_BIT;
+}
+
+static bool valid_brbe_format(int brbe_format)
+{
+	return brbe_format == BRBIDR0_EL1_FORMAT_FORMAT_0;
+}
+
+static bool valid_brbidr(u64 brbidr)
+{
+	int brbe_format, brbe_cc, brbe_nr;
+
+	brbe_format = FIELD_GET(BRBIDR0_EL1_FORMAT_MASK, brbidr);
+	brbe_cc = FIELD_GET(BRBIDR0_EL1_CC_MASK, brbidr);
+	brbe_nr = FIELD_GET(BRBIDR0_EL1_NUMREC_MASK, brbidr);
+
+	return valid_brbe_format(brbe_format) && valid_brbe_cc(brbe_cc) && valid_brbe_nr(brbe_nr);
+}
+
+static bool valid_brbe_version(int brbe_version)
+{
+	return brbe_version == ID_AA64DFR0_EL1_BRBE_IMP ||
+	       brbe_version == ID_AA64DFR0_EL1_BRBE_BRBE_V1P1;
+}
+
+static void select_brbe_bank(int bank)
+{
+	u64 brbfcr;
+
+	brbfcr = read_sysreg_s(SYS_BRBFCR_EL1);
+	brbfcr &= ~BRBFCR_EL1_BANK_MASK;
+	brbfcr |= SYS_FIELD_PREP(BRBFCR_EL1, BANK, bank);
+	write_sysreg_s(brbfcr, SYS_BRBFCR_EL1);
+	/*
+	 * Arm ARM (DDI 0487K.a) D.18.4 rule PPBZP requires explicit sync
+	 * between setting BANK and accessing branch records.
+	 */
+	isb();
+}
+
+static bool __read_brbe_regset(struct brbe_regset *entry, int idx)
+{
+	entry->brbinf = get_brbinf_reg(idx);
+
+	if (brbe_invalid(entry->brbinf))
+		return false;
+
+	entry->brbsrc = get_brbsrc_reg(idx);
+	entry->brbtgt = get_brbtgt_reg(idx);
+	return true;
+}
+
+/*
+ * Generic perf branch filters supported on BRBE
+ *
+ * New branch filters need to be evaluated whether they could be supported on
+ * BRBE. This ensures that such branch filters would not just be accepted, to
+ * fail silently. PERF_SAMPLE_BRANCH_HV is a special case that is selectively
+ * supported only on platforms where kernel is in hyp mode.
+ */
+#define BRBE_EXCLUDE_BRANCH_FILTERS (PERF_SAMPLE_BRANCH_ABORT_TX	| \
+				     PERF_SAMPLE_BRANCH_IN_TX		| \
+				     PERF_SAMPLE_BRANCH_NO_TX		| \
+				     PERF_SAMPLE_BRANCH_CALL_STACK	| \
+				     PERF_SAMPLE_BRANCH_COUNTERS)
+
+#define BRBE_ALLOWED_BRANCH_TYPES   (PERF_SAMPLE_BRANCH_ANY		| \
+				     PERF_SAMPLE_BRANCH_ANY_CALL	| \
+				     PERF_SAMPLE_BRANCH_ANY_RETURN	| \
+				     PERF_SAMPLE_BRANCH_IND_CALL	| \
+				     PERF_SAMPLE_BRANCH_COND		| \
+				     PERF_SAMPLE_BRANCH_IND_JUMP	| \
+				     PERF_SAMPLE_BRANCH_CALL)
+
+
+#define BRBE_ALLOWED_BRANCH_FILTERS (PERF_SAMPLE_BRANCH_USER		| \
+				     PERF_SAMPLE_BRANCH_KERNEL		| \
+				     PERF_SAMPLE_BRANCH_HV		| \
+				     BRBE_ALLOWED_BRANCH_TYPES		| \
+				     PERF_SAMPLE_BRANCH_NO_FLAGS	| \
+				     PERF_SAMPLE_BRANCH_NO_CYCLES	| \
+				     PERF_SAMPLE_BRANCH_TYPE_SAVE	| \
+				     PERF_SAMPLE_BRANCH_HW_INDEX	| \
+				     PERF_SAMPLE_BRANCH_PRIV_SAVE)
+
+#define BRBE_PERF_BRANCH_FILTERS    (BRBE_ALLOWED_BRANCH_FILTERS	| \
+				     BRBE_EXCLUDE_BRANCH_FILTERS)
+
+/*
+ * BRBE supports the following functional branch type filters while
+ * generating branch records. These branch filters can be enabled,
+ * either individually or as a group i.e ORing multiple filters
+ * with each other.
+ *
+ * BRBFCR_EL1_CONDDIR  - Conditional direct branch
+ * BRBFCR_EL1_DIRCALL  - Direct call
+ * BRBFCR_EL1_INDCALL  - Indirect call
+ * BRBFCR_EL1_INDIRECT - Indirect branch
+ * BRBFCR_EL1_DIRECT   - Direct branch
+ * BRBFCR_EL1_RTN      - Subroutine return
+ */
+static u64 branch_type_to_brbfcr(int branch_type)
+{
+	u64 brbfcr = 0;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_ANY) {
+		brbfcr |= BRBFCR_EL1_BRANCH_FILTERS;
+		return brbfcr;
+	}
+
+	if (branch_type & PERF_SAMPLE_BRANCH_ANY_CALL) {
+		brbfcr |= BRBFCR_EL1_INDCALL;
+		brbfcr |= BRBFCR_EL1_DIRCALL;
+	}
+
+	if (branch_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
+		brbfcr |= BRBFCR_EL1_RTN;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_IND_CALL)
+		brbfcr |= BRBFCR_EL1_INDCALL;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_COND)
+		brbfcr |= BRBFCR_EL1_CONDDIR;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_IND_JUMP)
+		brbfcr |= BRBFCR_EL1_INDIRECT;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_CALL)
+		brbfcr |= BRBFCR_EL1_DIRCALL;
+
+	return brbfcr;
+}
+
+/*
+ * BRBE supports the following privilege mode filters while generating
+ * branch records.
+ *
+ * BRBCR_ELx_E0BRE - EL0 branch records
+ * BRBCR_ELx_ExBRE - EL1/EL2 branch records
+ *
+ * BRBE also supports the following additional functional branch type
+ * filters while generating branch records.
+ *
+ * BRBCR_ELx_EXCEPTION - Exception
+ * BRBCR_ELx_ERTN     -  Exception return
+ */
+static u64 branch_type_to_brbcr(int branch_type)
+{
+	u64 brbcr = BRBCR_ELx_FZP | BRBCR_ELx_DEFAULT_TS;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_USER)
+		brbcr |= BRBCR_ELx_E0BRE;
+
+	/*
+	 * When running in the hyp mode, writing into BRBCR_EL1
+	 * actually writes into BRBCR_EL2 instead. Field E2BRE
+	 * is also at the same position as E1BRE.
+	 */
+	if (branch_type & PERF_SAMPLE_BRANCH_KERNEL)
+		brbcr |= BRBCR_ELx_ExBRE;
+
+	if (branch_type & PERF_SAMPLE_BRANCH_HV) {
+		if (is_kernel_in_hyp_mode())
+			brbcr |= BRBCR_ELx_ExBRE;
+	}
+
+	if (!(branch_type & PERF_SAMPLE_BRANCH_NO_CYCLES))
+		brbcr |= BRBCR_ELx_CC;
+
+	if (!(branch_type & PERF_SAMPLE_BRANCH_NO_FLAGS))
+		brbcr |= BRBCR_ELx_MPRED;
+
+	/*
+	 * The exception and exception return branches could be
+	 * captured, irrespective of the perf event's privilege.
+	 * If the perf event does not have enough privilege for
+	 * a given exception level, then addresses which falls
+	 * under that exception level will be reported as zero
+	 * for the captured branch record, creating source only
+	 * or target only records.
+	 */
+	if (branch_type & PERF_SAMPLE_BRANCH_KERNEL) {
+		if (branch_type & PERF_SAMPLE_BRANCH_ANY) {
+			brbcr |= BRBCR_ELx_EXCEPTION;
+			brbcr |= BRBCR_ELx_ERTN;
+		}
+
+		if (branch_type & PERF_SAMPLE_BRANCH_ANY_CALL)
+			brbcr |= BRBCR_ELx_EXCEPTION;
+
+		if (branch_type & PERF_SAMPLE_BRANCH_ANY_RETURN)
+			brbcr |= BRBCR_ELx_ERTN;
+	}
+	return brbcr;
+}
+
+bool brbe_branch_attr_valid(struct perf_event *event)
+{
+	u64 branch_type = event->attr.branch_sample_type;
+
+	/*
+	 * Ensure both perf branch filter allowed and exclude
+	 * masks are always in sync with the generic perf ABI.
+	 */
+	BUILD_BUG_ON(BRBE_PERF_BRANCH_FILTERS != (PERF_SAMPLE_BRANCH_MAX - 1));
+
+	if (branch_type & BRBE_EXCLUDE_BRANCH_FILTERS) {
+		pr_debug("requested branch filter not supported 0x%llx\n", branch_type);
+		return false;
+	}
+
+	/* Ensure at least 1 branch type is enabled */
+	if (!(branch_type & BRBE_ALLOWED_BRANCH_TYPES)) {
+		pr_debug("no branch type enabled 0x%llx\n", branch_type);
+		return false;
+	}
+
+	/*
+	 * No branches are recorded in guests nor nVHE hypervisors, so
+	 * excluding the host or both kernel and user is invalid.
+	 *
+	 * Ideally we'd just require exclude_guest and exclude_hv, but setting
+	 * event filters with perf for kernel or user don't set exclude_guest.
+	 * So effectively, exclude_guest and exclude_hv are ignored.
+	 */
+	if (event->attr.exclude_host || (event->attr.exclude_user && event->attr.exclude_kernel)) {
+		pr_debug("branch filter in hypervisor or guest only not supported 0x%llx\n", branch_type);
+		return false;
+	}
+
+	event->hw.branch_reg.config = branch_type_to_brbfcr(event->attr.branch_sample_type);
+	event->hw.extra_reg.config = branch_type_to_brbcr(event->attr.branch_sample_type);
+
+	return true;
+}
+
+unsigned int brbe_num_branch_records(const struct arm_pmu *armpmu)
+{
+	return FIELD_GET(BRBIDR0_EL1_NUMREC_MASK, armpmu->reg_brbidr);
+}
+
+void brbe_probe(struct arm_pmu *armpmu)
+{
+	u64 aa64dfr0 = read_sysreg_s(SYS_ID_AA64DFR0_EL1);
+	u32 brbe;
+
+	brbe = cpuid_feature_extract_unsigned_field(aa64dfr0, ID_AA64DFR0_EL1_BRBE_SHIFT);
+	if (!valid_brbe_version(brbe))
+		return;
+
+	u64 brbidr = read_sysreg_s(SYS_BRBIDR0_EL1);
+	if (!valid_brbidr(brbidr))
+		return;
+
+	armpmu->reg_brbidr = brbidr;
+}
+
+/*
+ * BRBE is assumed to be disabled/paused on entry
+ */
+void brbe_enable(const struct arm_pmu *arm_pmu)
+{
+	struct pmu_hw_events *cpuc = this_cpu_ptr(arm_pmu->hw_events);
+	u64 brbfcr = 0, brbcr = 0;
+
+	/*
+	 * Merge the permitted branch filters of all events.
+	 */
+	for (int i = 0; i < ARMPMU_MAX_HWEVENTS; i++) {
+		struct perf_event *event = cpuc->events[i];
+
+		if (event && has_branch_stack(event)) {
+			brbfcr |= event->hw.branch_reg.config;
+			brbcr |= event->hw.extra_reg.config;
+		}
+	}
+
+	/*
+	 * In VHE mode with MDCR_EL2.HPMN equal to PMCR_EL0.N, BRBCR_EL1.FZP
+	 * controls freezing the branch records on counter overflow rather than
+	 * BRBCR_EL2.FZP (which writes to BRBCR_EL1 are redirected to).
+	 * The exception levels are enabled/disabled in BRBCR_EL2, so keep EL1
+	 * and EL0 recording disabled for guests.
+	 *
+	 * As BRBCR_EL1 CC and MPRED bits also need to match, use the same
+	 * value for both registers just masking the exception levels.
+	 */
+	if (is_kernel_in_hyp_mode())
+		write_sysreg_s(brbcr & ~(BRBCR_ELx_ExBRE | BRBCR_ELx_E0BRE), SYS_BRBCR_EL12);
+	write_sysreg_s(brbcr, SYS_BRBCR_EL1);
+	isb(); // Ensure BRBCR_ELx settings take effect before unpausing
+
+	// Finally write SYS_BRBFCR_EL to unpause BRBE
+	write_sysreg_s(brbfcr, SYS_BRBFCR_EL1);
+	// Synchronization in PMCR write ensures ordering WRT PMU enabling
+}
+
+void brbe_disable(void)
+{
+	/*
+	 * No need for synchronization here as synchronization in PMCR write
+	 * ensures ordering and in the interrupt handler this is a NOP as
+	 * we're already paused.
+	 */
+	write_sysreg_s(BRBFCR_EL1_PAUSED, SYS_BRBFCR_EL1);
+	write_sysreg_s(0, SYS_BRBCR_EL1);
+}
+
+static const int brbe_type_to_perf_type_map[BRBINFx_EL1_TYPE_DEBUG_EXIT + 1][2] = {
+	[BRBINFx_EL1_TYPE_DIRECT_UNCOND] = { PERF_BR_UNCOND, 0 },
+	[BRBINFx_EL1_TYPE_INDIRECT] = { PERF_BR_IND, 0 },
+	[BRBINFx_EL1_TYPE_DIRECT_LINK] = { PERF_BR_CALL, 0 },
+	[BRBINFx_EL1_TYPE_INDIRECT_LINK] = { PERF_BR_IND_CALL, 0 },
+	[BRBINFx_EL1_TYPE_RET] = { PERF_BR_RET, 0 },
+	[BRBINFx_EL1_TYPE_DIRECT_COND] = { PERF_BR_COND, 0 },
+	[BRBINFx_EL1_TYPE_CALL] = { PERF_BR_CALL, 0 },
+	[BRBINFx_EL1_TYPE_ERET] = { PERF_BR_ERET, 0 },
+	[BRBINFx_EL1_TYPE_IRQ] = { PERF_BR_IRQ, 0 },
+	[BRBINFx_EL1_TYPE_TRAP] = { PERF_BR_IRQ, 0 },
+	[BRBINFx_EL1_TYPE_SERROR] = { PERF_BR_SERROR, 0 },
+	[BRBINFx_EL1_TYPE_ALIGN_FAULT] = { PERF_BR_EXTEND_ABI, PERF_BR_NEW_FAULT_ALGN },
+	[BRBINFx_EL1_TYPE_INSN_FAULT] = { PERF_BR_EXTEND_ABI, PERF_BR_NEW_FAULT_INST },
+	[BRBINFx_EL1_TYPE_DATA_FAULT] = { PERF_BR_EXTEND_ABI, PERF_BR_NEW_FAULT_DATA },
+};
+
+static void brbe_set_perf_entry_type(struct perf_branch_entry *entry, u64 brbinf)
+{
+	int brbe_type = brbinf_get_type(brbinf);
+
+	if (brbe_type <= BRBINFx_EL1_TYPE_DEBUG_EXIT) {
+		const int *br_type = brbe_type_to_perf_type_map[brbe_type];
+
+		entry->type = br_type[0];
+		entry->new_type = br_type[1];
+	}
+}
+
+static int brbinf_get_perf_priv(u64 brbinf)
+{
+	int brbe_el = brbinf_get_el(brbinf);
+
+	switch (brbe_el) {
+	case BRBINFx_EL1_EL_EL0:
+		return PERF_BR_PRIV_USER;
+	case BRBINFx_EL1_EL_EL1:
+		return PERF_BR_PRIV_KERNEL;
+	case BRBINFx_EL1_EL_EL2:
+		if (is_kernel_in_hyp_mode())
+			return PERF_BR_PRIV_KERNEL;
+		return PERF_BR_PRIV_HV;
+	default:
+		pr_warn_once("%d - unknown branch privilege captured\n", brbe_el);
+		return PERF_BR_PRIV_UNKNOWN;
+	}
+}
+
+static bool perf_entry_from_brbe_regset(int index, struct perf_branch_entry *entry,
+					const struct perf_event *event)
+{
+	struct brbe_regset bregs;
+	u64 brbinf;
+
+	if (!__read_brbe_regset(&bregs, index))
+		return false;
+
+	brbinf = bregs.brbinf;
+	perf_clear_branch_entry_bitfields(entry);
+	if (brbe_record_is_complete(brbinf)) {
+		entry->from = bregs.brbsrc;
+		entry->to = bregs.brbtgt;
+	} else if (brbe_record_is_source_only(brbinf)) {
+		entry->from = bregs.brbsrc;
+		entry->to = 0;
+	} else if (brbe_record_is_target_only(brbinf)) {
+		entry->from = 0;
+		entry->to = bregs.brbtgt;
+	}
+
+	brbe_set_perf_entry_type(entry, brbinf);
+
+	if (!branch_sample_no_cycles(event))
+		entry->cycles = brbinf_get_cycles(brbinf);
+
+	if (!branch_sample_no_flags(event)) {
+		/* Mispredict info is available for source only and complete branch records. */
+		if (!brbe_record_is_target_only(brbinf)) {
+			entry->mispred = brbinf_get_mispredict(brbinf);
+			entry->predicted = !entry->mispred;
+		}
+
+		/*
+		 * Currently TME feature is neither implemented in any hardware
+		 * nor it is being supported in the kernel. Just warn here once
+		 * if TME related information shows up rather unexpectedly.
+		 */
+		if (brbinf_get_lastfailed(brbinf) || brbinf_get_in_tx(brbinf))
+			pr_warn_once("Unknown transaction states\n");
+	}
+
+	/*
+	 * Branch privilege level is available for target only and complete
+	 * branch records.
+	 */
+	if (!brbe_record_is_source_only(brbinf))
+		entry->priv = brbinf_get_perf_priv(brbinf);
+
+	return true;
+}
+
+#define PERF_BR_ARM64_ALL (				\
+	BIT(PERF_BR_COND) |				\
+	BIT(PERF_BR_UNCOND) |				\
+	BIT(PERF_BR_IND) |				\
+	BIT(PERF_BR_CALL) |				\
+	BIT(PERF_BR_IND_CALL) |				\
+	BIT(PERF_BR_RET))
+
+#define PERF_BR_ARM64_ALL_KERNEL (			\
+	BIT(PERF_BR_SYSCALL) |				\
+	BIT(PERF_BR_IRQ) |				\
+	BIT(PERF_BR_SERROR) |				\
+	BIT(PERF_BR_MAX + PERF_BR_NEW_FAULT_ALGN) |	\
+	BIT(PERF_BR_MAX + PERF_BR_NEW_FAULT_DATA) |	\
+	BIT(PERF_BR_MAX + PERF_BR_NEW_FAULT_INST))
+
+static void prepare_event_branch_type_mask(u64 branch_sample,
+					   unsigned long *event_type_mask)
+{
+	if (branch_sample & PERF_SAMPLE_BRANCH_ANY) {
+		if (branch_sample & PERF_SAMPLE_BRANCH_KERNEL)
+			bitmap_from_u64(event_type_mask,
+				BIT(PERF_BR_ERET) | PERF_BR_ARM64_ALL |
+				PERF_BR_ARM64_ALL_KERNEL);
+		else
+			bitmap_from_u64(event_type_mask, PERF_BR_ARM64_ALL);
+		return;
+	}
+
+	bitmap_zero(event_type_mask, PERF_BR_ARM64_MAX);
+
+	if (branch_sample & PERF_SAMPLE_BRANCH_ANY_CALL) {
+		if (branch_sample & PERF_SAMPLE_BRANCH_KERNEL)
+			bitmap_from_u64(event_type_mask, PERF_BR_ARM64_ALL_KERNEL);
+
+		set_bit(PERF_BR_CALL, event_type_mask);
+		set_bit(PERF_BR_IND_CALL, event_type_mask);
+	}
+
+	if (branch_sample & PERF_SAMPLE_BRANCH_IND_JUMP)
+		set_bit(PERF_BR_IND, event_type_mask);
+
+	if (branch_sample & PERF_SAMPLE_BRANCH_COND)
+		set_bit(PERF_BR_COND, event_type_mask);
+
+	if (branch_sample & PERF_SAMPLE_BRANCH_CALL)
+		set_bit(PERF_BR_CALL, event_type_mask);
+
+	if (branch_sample & PERF_SAMPLE_BRANCH_IND_CALL)
+		set_bit(PERF_BR_IND_CALL, event_type_mask);
+
+	if (branch_sample & PERF_SAMPLE_BRANCH_ANY_RETURN) {
+		set_bit(PERF_BR_RET, event_type_mask);
+
+		if (branch_sample & PERF_SAMPLE_BRANCH_KERNEL)
+			set_bit(PERF_BR_ERET, event_type_mask);
+	}
+}
+
+/*
+ * BRBE is configured with an OR of permissions from all events, so there may
+ * be events which have to be dropped or events where just the source or target
+ * address has to be zeroed.
+ */
+static bool filter_branch_privilege(struct perf_branch_entry *entry, u64 branch_sample_type)
+{
+	/* We can only have a half record if permissions have not been expanded */
+	if (!entry->from || !entry->to)
+		return true;
+
+	bool from_user = access_ok((void __user *)(unsigned long)entry->from, 4);
+	bool to_user = access_ok((void __user *)(unsigned long)entry->to, 4);
+	bool exclude_kernel = !((branch_sample_type & PERF_SAMPLE_BRANCH_KERNEL) ||
+		(is_kernel_in_hyp_mode() && (branch_sample_type & PERF_SAMPLE_BRANCH_HV)));
+
+	/*
+	 * If record is within a single exception level, just need to either
+	 * drop or keep the entire record.
+	 */
+	if (from_user == to_user)
+		return ((entry->priv == PERF_BR_PRIV_KERNEL) && !exclude_kernel) ||
+			((entry->priv == PERF_BR_PRIV_USER) &&
+			 (branch_sample_type & PERF_SAMPLE_BRANCH_USER));
+
+	// Fixup calls which are syscalls
+	if (entry->type == PERF_BR_CALL && from_user && !to_user)
+		entry->type = PERF_BR_SYSCALL;
+
+	/*
+	 * Record is across exception levels, mask addresses for the exception
+	 * level we're not capturing.
+	 */
+	if (!(branch_sample_type & PERF_SAMPLE_BRANCH_USER)) {
+		if (from_user)
+			entry->from = 0;
+		if (to_user)
+			entry->to = 0;
+	}
+
+	if (exclude_kernel) {
+		if (!from_user)
+			entry->from = 0;
+		if (!to_user)
+			entry->to = 0;
+	}
+	return true;
+}
+
+static bool filter_branch_type(struct perf_branch_entry *entry,
+			       const unsigned long *event_type_mask)
+{
+	if (entry->type == PERF_BR_EXTEND_ABI)
+		return test_bit(PERF_BR_MAX + entry->new_type, event_type_mask);
+	else
+		return test_bit(entry->type, event_type_mask);
+}
+
+static bool filter_branch_record(struct perf_branch_entry *entry,
+				 u64 branch_sample,
+				 const unsigned long *event_type_mask)
+{
+	return filter_branch_type(entry, event_type_mask) &&
+		filter_branch_privilege(entry, branch_sample);
+}
+
+void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
+				const struct perf_event *event)
+{
+	struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+	int nr_hw = brbe_num_branch_records(cpu_pmu);
+	int nr_banks = DIV_ROUND_UP(nr_hw, BRBE_BANK_MAX_ENTRIES);
+	int nr_filtered = 0;
+	u64 branch_sample_type = event->attr.branch_sample_type;
+	DECLARE_BITMAP(event_type_mask, PERF_BR_ARM64_MAX);
+
+	prepare_event_branch_type_mask(branch_sample_type, event_type_mask);
+
+	for (int bank = 0; bank < nr_banks; bank++) {
+		int nr_remaining = nr_hw - (bank * BRBE_BANK_MAX_ENTRIES);
+		int nr_this_bank = min(nr_remaining, BRBE_BANK_MAX_ENTRIES);
+
+		select_brbe_bank(bank);
+
+		for (int i = 0; i < nr_this_bank; i++) {
+			struct perf_branch_entry *pbe = &branch_stack->entries[nr_filtered];
+
+			if (!perf_entry_from_brbe_regset(i, pbe, event))
+				goto done;
+
+			if (!filter_branch_record(pbe, branch_sample_type, event_type_mask))
+				continue;
+
+			nr_filtered++;
+		}
+	}
+
+done:
+	brbe_invalidate();
+	branch_stack->nr = nr_filtered;
+}
diff --git a/drivers/perf/arm_brbe.h b/drivers/perf/arm_brbe.h
new file mode 100644
index 000000000000..b7c7d8796c86
--- /dev/null
+++ b/drivers/perf/arm_brbe.h
@@ -0,0 +1,47 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Branch Record Buffer Extension Helpers.
+ *
+ * Copyright (C) 2022-2025 ARM Limited
+ *
+ * Author: Anshuman Khandual <anshuman.khandual@arm.com>
+ */
+
+struct arm_pmu;
+struct perf_branch_stack;
+struct perf_event;
+
+#ifdef CONFIG_ARM64_BRBE
+void brbe_probe(struct arm_pmu *arm_pmu);
+unsigned int brbe_num_branch_records(const struct arm_pmu *armpmu);
+void brbe_invalidate(void);
+
+void brbe_enable(const struct arm_pmu *arm_pmu);
+void brbe_disable(void);
+
+bool brbe_branch_attr_valid(struct perf_event *event);
+void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
+				const struct perf_event *event);
+#else
+static inline void brbe_probe(struct arm_pmu *arm_pmu) { }
+static inline unsigned int brbe_num_branch_records(const struct arm_pmu *armpmu)
+{
+	return 0;
+}
+
+static inline void brbe_invalidate(void) { }
+
+static inline void brbe_enable(const struct arm_pmu *arm_pmu) { };
+static inline void brbe_disable(void) { };
+
+static inline bool brbe_branch_attr_valid(struct perf_event *event)
+{
+	WARN_ON_ONCE(!has_branch_stack(event));
+	return false;
+}
+
+static void brbe_read_filtered_entries(struct perf_branch_stack *branch_stack,
+				       const struct perf_event *event)
+{
+}
+#endif
diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 2f33e69a8caf..df9867c0dc57 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -99,7 +99,7 @@ static const struct pmu_irq_ops percpu_pmunmi_ops = {
 	.free_pmuirq = armpmu_free_percpu_pmunmi
 };
 
-static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
+DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 static DEFINE_PER_CPU(const struct pmu_irq_ops *, cpu_irq_ops);
 
@@ -317,6 +317,11 @@ armpmu_del(struct perf_event *event, int flags)
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
+	if (has_branch_stack(event)) {
+		hw_events->branch_users--;
+		perf_sched_cb_dec(event->pmu);
+	}
+
 	armpmu_stop(event, PERF_EF_UPDATE);
 	hw_events->events[idx] = NULL;
 	armpmu->clear_event_idx(hw_events, event);
@@ -345,6 +350,11 @@ armpmu_add(struct perf_event *event, int flags)
 	/* The newly-allocated counter should be empty */
 	WARN_ON_ONCE(hw_events->events[idx]);
 
+	if (has_branch_stack(event)) {
+		hw_events->branch_users++;
+		perf_sched_cb_inc(event->pmu);
+	}
+
 	event->hw.idx = idx;
 	hw_events->events[idx] = event;
 
@@ -509,8 +519,7 @@ static int armpmu_event_init(struct perf_event *event)
 		!cpumask_test_cpu(event->cpu, &armpmu->supported_cpus))
 		return -ENOENT;
 
-	/* does not support taken branch sampling */
-	if (has_branch_stack(event))
+	if (has_branch_stack(event) && !armpmu->reg_brbidr)
 		return -EOPNOTSUPP;
 
 	return __hw_perf_event_init(event);
diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c
index 5406b9ca591a..d57e2b5da42c 100644
--- a/drivers/perf/arm_pmuv3.c
+++ b/drivers/perf/arm_pmuv3.c
@@ -25,6 +25,8 @@
 #include <linux/smp.h>
 #include <linux/nmi.h>
 
+#include "arm_brbe.h"
+
 /* ARMv8 Cortex-A53 specific event types. */
 #define ARMV8_A53_PERFCTR_PREF_LINEFILL				0xC2
 
@@ -438,7 +440,19 @@ static ssize_t threshold_max_show(struct device *dev,
 
 static DEVICE_ATTR_RO(threshold_max);
 
+static ssize_t branches_show(struct device *dev,
+			     struct device_attribute *attr, char *page)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
+
+	return sysfs_emit(page, "%d\n", brbe_num_branch_records(cpu_pmu));
+}
+
+static DEVICE_ATTR_RO(branches);
+
 static struct attribute *armv8_pmuv3_caps_attrs[] = {
+	&dev_attr_branches.attr,
 	&dev_attr_slots.attr,
 	&dev_attr_bus_slots.attr,
 	&dev_attr_bus_width.attr,
@@ -446,9 +460,22 @@ static struct attribute *armv8_pmuv3_caps_attrs[] = {
 	NULL,
 };
 
+static umode_t caps_is_visible(struct kobject *kobj, struct attribute *attr, int i)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
+
+	if (i == 0)
+		return brbe_num_branch_records(cpu_pmu) ? attr->mode : 0;
+
+	return attr->mode;
+}
+
 static const struct attribute_group armv8_pmuv3_caps_attr_group = {
 	.name = "caps",
 	.attrs = armv8_pmuv3_caps_attrs,
+	.is_visible = caps_is_visible,
 };
 
 /*
@@ -806,9 +833,10 @@ static void armv8pmu_disable_event(struct perf_event *event)
 	armv8pmu_disable_event_irq(event);
 }
 
-static void armv8pmu_start(struct arm_pmu *cpu_pmu)
+static void armv8pmu_restart(struct arm_pmu *cpu_pmu)
 {
 	struct perf_event_context *ctx;
+	struct pmu_hw_events *hw_events = this_cpu_ptr(cpu_pmu->hw_events);
 	int nr_user = 0;
 
 	ctx = perf_cpu_task_ctx();
@@ -822,16 +850,44 @@ static void armv8pmu_start(struct arm_pmu *cpu_pmu)
 
 	kvm_vcpu_pmu_resync_el0();
 
+	if (hw_events->branch_users)
+		brbe_enable(cpu_pmu);
+
 	/* Enable all counters */
 	armv8pmu_pmcr_write(armv8pmu_pmcr_read() | ARMV8_PMU_PMCR_E);
 }
 
+static void armv8pmu_start(struct arm_pmu *cpu_pmu)
+{
+	struct pmu_hw_events *hw_events = this_cpu_ptr(cpu_pmu->hw_events);
+
+	if (hw_events->branch_users)
+		brbe_invalidate();
+
+	armv8pmu_restart(cpu_pmu);
+}
+
 static void armv8pmu_stop(struct arm_pmu *cpu_pmu)
 {
+	struct pmu_hw_events *hw_events = this_cpu_ptr(cpu_pmu->hw_events);
+
+	if (hw_events->branch_users)
+		brbe_disable();
+
 	/* Disable all counters */
 	armv8pmu_pmcr_write(armv8pmu_pmcr_read() & ~ARMV8_PMU_PMCR_E);
 }
 
+static void read_branch_records(struct pmu_hw_events *cpuc,
+				struct perf_event *event,
+				struct perf_sample_data *data)
+{
+	struct perf_branch_stack *branch_stack = cpuc->branch_stack;
+
+	brbe_read_filtered_entries(branch_stack, event);
+	perf_sample_save_brstack(data, event, branch_stack, NULL);
+}
+
 static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 {
 	u64 pmovsr;
@@ -882,6 +938,13 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 		if (!armpmu_event_set_period(event))
 			continue;
 
+		/*
+		 * PMU IRQ should remain asserted until all branch records
+		 * are captured and processed into struct perf_sample_data.
+		 */
+		if (has_branch_stack(event))
+			read_branch_records(cpuc, event, &data);
+
 		/*
 		 * Perf event overflow will queue the processing of the event as
 		 * an irq_work which will be taken care of in the handling of
@@ -890,7 +953,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu)
 		if (perf_event_overflow(event, &data, regs))
 			cpu_pmu->disable(event);
 	}
-	armv8pmu_start(cpu_pmu);
+	armv8pmu_restart(cpu_pmu);
 
 	return IRQ_HANDLED;
 }
@@ -939,7 +1002,7 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc,
 
 	/* Always prefer to place a cycle counter into the cycle counter. */
 	if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) &&
-	    !armv8pmu_event_get_threshold(&event->attr)) {
+	    !armv8pmu_event_get_threshold(&event->attr) && !has_branch_stack(event)) {
 		if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask))
 			return ARMV8_PMU_CYCLE_IDX;
 		else if (armv8pmu_event_is_64bit(event) &&
@@ -988,6 +1051,18 @@ static int armv8pmu_user_event_idx(struct perf_event *event)
 	return event->hw.idx + 1;
 }
 
+static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
+{
+	struct arm_pmu *armpmu = *this_cpu_ptr(&cpu_armpmu);
+	struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
+
+	if (!hw_events->branch_users)
+		return;
+
+	if (sched_in)
+		brbe_invalidate();
+}
+
 /*
  * Add an event filter to a given event.
  */
@@ -1005,6 +1080,13 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event,
 		return -EOPNOTSUPP;
 	}
 
+	if (has_branch_stack(perf_event)) {
+		if (!brbe_num_branch_records(cpu_pmu) || !brbe_branch_attr_valid(perf_event))
+			return -EOPNOTSUPP;
+
+		perf_event->attach_state |= PERF_ATTACH_SCHED_CB;
+	}
+
 	/*
 	 * If we're running in hyp mode, then we *are* the hypervisor.
 	 * Therefore we ignore exclude_hv in this configuration, since
@@ -1071,6 +1153,11 @@ static void armv8pmu_reset(void *info)
 	/* Clear the counters we flip at guest entry/exit */
 	kvm_clr_pmu_events(mask);
 
+	if (brbe_num_branch_records(cpu_pmu)) {
+		brbe_disable();
+		brbe_invalidate();
+	}
+
 	/*
 	 * Initialize & Reset PMNC. Request overflow interrupt for
 	 * 64 bit cycle counter but cheat in armv8pmu_write_counter().
@@ -1239,6 +1326,30 @@ static void __armv8pmu_probe_pmu(void *info)
 		cpu_pmu->reg_pmmir = read_pmmir();
 	else
 		cpu_pmu->reg_pmmir = 0;
+
+	brbe_probe(cpu_pmu);
+}
+
+static int branch_records_alloc(struct arm_pmu *armpmu)
+{
+	struct perf_branch_stack *branch_stack_cpu;
+	struct perf_branch_stack __percpu *branch_stack;
+	size_t size = struct_size(branch_stack_cpu, entries, brbe_num_branch_records(armpmu));
+	int cpu;
+
+	branch_stack = __alloc_percpu_gfp(size, __alignof__(*branch_stack_cpu),
+					  GFP_KERNEL);
+	if (!branch_stack)
+		return -ENOMEM;
+
+	for_each_possible_cpu(cpu) {
+		struct pmu_hw_events *events_cpu;
+
+		events_cpu = per_cpu_ptr(armpmu->hw_events, cpu);
+		branch_stack_cpu = per_cpu_ptr(branch_stack, cpu);
+		events_cpu->branch_stack = branch_stack_cpu;
+	}
+	return 0;
 }
 
 static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
@@ -1255,7 +1366,15 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
 	if (ret)
 		return ret;
 
-	return probe.present ? 0 : -ENODEV;
+	if (!probe.present)
+		return -ENODEV;
+
+	if (brbe_num_branch_records(cpu_pmu)) {
+		ret = branch_records_alloc(cpu_pmu);
+		if (ret)
+			return ret;
+	}
+	return 0;
 }
 
 static void armv8pmu_disable_user_access_ipi(void *unused)
@@ -1314,6 +1433,7 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char *name,
 	cpu_pmu->set_event_filter	= armv8pmu_set_event_filter;
 
 	cpu_pmu->pmu.event_idx		= armv8pmu_user_event_idx;
+	cpu_pmu->pmu.sched_task		= armv8pmu_sched_task;
 
 	cpu_pmu->name			= name;
 	cpu_pmu->map_event		= map_event;
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index c70d528594f2..219d58259857 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -70,6 +70,11 @@ struct pmu_hw_events {
 	struct arm_pmu		*percpu_pmu;
 
 	int irq;
+
+	struct perf_branch_stack	*branch_stack;
+
+	/* Active events requesting branch records */
+	unsigned int		branch_users;
 };
 
 enum armpmu_attr_groups {
@@ -111,6 +116,7 @@ struct arm_pmu {
 	/* PMUv3 only */
 	int		pmuver;
 	u64		reg_pmmir;
+	u64		reg_brbidr;
 #define ARMV8_PMUV3_MAX_COMMON_EVENTS		0x40
 	DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS);
 #define ARMV8_PMUV3_EXT_COMMON_EVENT_BASE	0x4000
@@ -122,6 +128,8 @@ struct arm_pmu {
 
 #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
 
+DECLARE_PER_CPU(struct arm_pmu *, cpu_armpmu);
+
 u64 armpmu_event_update(struct perf_event *event);
 
 int armpmu_event_set_period(struct perf_event *event);

-- 
2.47.2


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 00/11] arm64/perf: Enable branch stack sampling
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
                   ` (10 preceding siblings ...)
  2025-02-18 20:40 ` [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE) Rob Herring (Arm)
@ 2025-02-19 16:09 ` James Clark
  2025-03-01  7:05 ` Will Deacon
  12 siblings, 0 replies; 33+ messages in thread
From: James Clark @ 2025-02-19 16:09 UTC (permalink / raw)
  To: Rob Herring (Arm)
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm, Mark Brown, Will Deacon, Mark Rutland, Catalin Marinas,
	Jonathan Corbet, Marc Zyngier, Oliver Upton, Joey Gouly,
	Suzuki K Poulose, Zenghui Yu, Anshuman Khandual, Leo Yan



On 18/02/2025 8:39 pm, Rob Herring (Arm) wrote:
> This series enables perf branch stack sampling support on arm64 via a
> v9.2 arch feature called Branch Record Buffer Extension (BRBE). Details
> on BRBE can be found in the Arm ARM[1] chapter D18.
> 
> I've picked up this series from Anshuman. v19 and v20 versions have been
> reworked quite a bit by Mark and myself. The bulk of those changes are
> in patch 11.
> 
> Patches 1-7 are new clean-ups/prep which stand on their own. They
> were previously posted here[2]. Please pick them up if there's no issues
> with them.
> 
> Patches 8-11 add BRBE support with the actual support in patch 11.
> 
> A git branch is here[3].
> 
> [1] https://developer.arm.com/documentation/ddi0487/latest/
> [2] https://lore.kernel.org/all/20250107-arm-pmu-cleanups-v1-v1-0-313951346a25@kernel.org/
> [3] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git arm/brbe-v20
> 
> v20:
>   - Added back some of the arm64 specific exception types. The x86 IRQ
>     branches also include other exceptions like page faults. On arm64, we
>     can distinguish the exception types, so we do. Also, to better
>     align with x86, we convert 'call' branches which are user to kernel
>     to 'syscall'.
>   - Only enable exceptions and exception returns if recording kernel
>     branches (matching x86)
>   - Drop requiring event and branch privileges to match
>   - Add "branches" caps sysfs attribute like x86
>   - Reword comment about FZP and MDCR_EL2.HPMN interaction
>   - Rework BRBE invalidation to avoid invalidating in interrupt handler
>     when no handled events capture the branch stack (i.e. when there are
>     multiple users).
>   - Also clear BRBCR_ELx bits in brbe_disable(). This is for KVM nVHE
>     checks if BRBE is enabled.
>   - Document that MDCR_EL3.SBRBE can be 0b01 also
> 

Tested-by: James Clark <james.clark@linaro.org>


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 10/11] KVM: arm64: nvhe: Disable branch generation in nVHE guests
  2025-02-18 20:40 ` [PATCH v20 10/11] KVM: arm64: nvhe: Disable branch generation in nVHE guests Rob Herring (Arm)
@ 2025-02-24 10:41   ` Leo Yan
  0 siblings, 0 replies; 33+ messages in thread
From: Leo Yan @ 2025-02-24 10:41 UTC (permalink / raw)
  To: Rob Herring (Arm)
  Cc: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Tue, Feb 18, 2025 at 02:40:05PM -0600, Rob Herring (Arm) wrote:
> 
> From: Anshuman Khandual <anshuman.khandual@arm.com>
> 
> While BRBE can record branches within guests, the host recording
> branches in guests is not supported by perf (though events are).
> Support for BRBE in guests will supported by providing direct access
> to BRBE within the guests. That is how x86 LBR works for guests.
> Therefore, BRBE needs to be disabled on guest entry and restored on
> exit.
> 
> For nVHE, this requires explicit handling for guests. Before
> entering a guest, save the BRBE state and disable the it. When
> returning to the host, restore the state.
> 
> For VHE, it is not necessary. We initialize
> BRBCR_EL1.{E1BRE,E0BRE}=={0,0} at boot time, and HCR_EL2.TGE==1 while
> running in the host. We configure BRBCR_EL2.{E2BRE,E0HBRE} to enable
> branch recording in the host. When entering the guest, we set
> HCR_EL2.TGE==0 which means BRBCR_EL1 is used instead of BRBCR_EL2.
> Consequently for VHE, BRBE recording is disabled at EL1 and EL0 when
> running a guest.
> 
> Should recording in guests (by the host) ever be desired, the perf ABI
> will need to be extended to distinguish guest addresses (struct
> perf_branch_entry.priv) for starters. BRBE records would also need to be
> invalidated on guest entry/exit as guest/host EL1 and EL0 records can't
> be distinguished.
> 
> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Co-developed-by: Rob Herring (Arm) <robh@kernel.org>
> Signed-off-by: Rob Herring (Arm) <robh@kernel.org>

Reviewed-by: Leo Yan <leo.yan@arm.com>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-18 20:40 ` [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE) Rob Herring (Arm)
@ 2025-02-24 12:25   ` Leo Yan
  2025-02-24 12:46     ` Rob Herring
  0 siblings, 1 reply; 33+ messages in thread
From: Leo Yan @ 2025-02-24 12:25 UTC (permalink / raw)
  To: Rob Herring (Arm)
  Cc: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Tue, Feb 18, 2025 at 02:40:06PM -0600, Rob Herring (Arm) wrote:
> 
> From: Anshuman Khandual <anshuman.khandual@arm.com>

[...]

> BRBE records are invalidated whenever events are reconfigured, a new
> task is scheduled in, or after recording is paused (and the records
> have been recorded for the event). The architecture allows branch
> records to be invalidated by the PE under implementation defined
> conditions. It is expected that these conditions are rare.

[...]

> +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
> +{
> +       struct arm_pmu *armpmu = *this_cpu_ptr(&cpu_armpmu);
> +       struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
> +
> +       if (!hw_events->branch_users)
> +               return;
> +
> +       if (sched_in)
> +               brbe_invalidate();
> +}

Just a minor concern.  I don't see any handling for task migration.
E.g., for a task is migrated from one CPU to another CPU, I expect we
need to save and restore branch records based on BRBE injection.  So
far, the driver simply invalidates all records.

I think this topic is very likely discussed before.  If this is the
case, please ignore my comment.  Except this, the code looks good
to me.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-24 12:25   ` Leo Yan
@ 2025-02-24 12:46     ` Rob Herring
  2025-02-24 14:03       ` Leo Yan
  0 siblings, 1 reply; 33+ messages in thread
From: Rob Herring @ 2025-02-24 12:46 UTC (permalink / raw)
  To: Leo Yan
  Cc: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Mon, Feb 24, 2025 at 6:25 AM Leo Yan <leo.yan@arm.com> wrote:
>
> On Tue, Feb 18, 2025 at 02:40:06PM -0600, Rob Herring (Arm) wrote:
> >
> > From: Anshuman Khandual <anshuman.khandual@arm.com>
>
> [...]
>
> > BRBE records are invalidated whenever events are reconfigured, a new
> > task is scheduled in, or after recording is paused (and the records
> > have been recorded for the event). The architecture allows branch
> > records to be invalidated by the PE under implementation defined
> > conditions. It is expected that these conditions are rare.
>
> [...]
>
> > +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
> > +{
> > +       struct arm_pmu *armpmu = *this_cpu_ptr(&cpu_armpmu);
> > +       struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
> > +
> > +       if (!hw_events->branch_users)
> > +               return;
> > +
> > +       if (sched_in)
> > +               brbe_invalidate();
> > +}
>
> Just a minor concern.  I don't see any handling for task migration.
> E.g., for a task is migrated from one CPU to another CPU, I expect we
> need to save and restore branch records based on BRBE injection.  So
> far, the driver simply invalidates all records.
>
> I think this topic is very likely discussed before.  If this is the
> case, please ignore my comment.  Except this, the code looks good
> to me.

Not really discussed on the list, but that was present in v18 (though
not functional because .sched_task() hook wasn't actually enabled) and
Mark removed it. His work is here[1].The only comment was:

Note: saving/restoring at context-switch doesn't interact well with
event rotation (e.g. if filters change)

Rob

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git/commit/?h=arm64/brbe&id=642985af34d2d6f54e76995380cf24d512078c56

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-24 12:46     ` Rob Herring
@ 2025-02-24 14:03       ` Leo Yan
  2025-02-24 16:05         ` Mark Rutland
  0 siblings, 1 reply; 33+ messages in thread
From: Leo Yan @ 2025-02-24 14:03 UTC (permalink / raw)
  To: Rob Herring
  Cc: Will Deacon, Mark Rutland, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Mon, Feb 24, 2025 at 06:46:35AM -0600, Rob Herring wrote:
> On Mon, Feb 24, 2025 at 6:25 AM Leo Yan <leo.yan@arm.com> wrote:
> > On Tue, Feb 18, 2025 at 02:40:06PM -0600, Rob Herring (Arm) wrote:
> > >
> > > From: Anshuman Khandual <anshuman.khandual@arm.com>
> >
> > [...]
> >
> > > BRBE records are invalidated whenever events are reconfigured, a new
> > > task is scheduled in, or after recording is paused (and the records
> > > have been recorded for the event). The architecture allows branch
> > > records to be invalidated by the PE under implementation defined
> > > conditions. It is expected that these conditions are rare.
> >
> > [...]
> >
> > > +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
> > > +{
> > > +       struct arm_pmu *armpmu = *this_cpu_ptr(&cpu_armpmu);
> > > +       struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
> > > +
> > > +       if (!hw_events->branch_users)
> > > +               return;
> > > +
> > > +       if (sched_in)
> > > +               brbe_invalidate();
> > > +}
> >
> > Just a minor concern.  I don't see any handling for task migration.
> > E.g., for a task is migrated from one CPU to another CPU, I expect we
> > need to save and restore branch records based on BRBE injection.  So
> > far, the driver simply invalidates all records.
> >
> > I think this topic is very likely discussed before.  If this is the
> > case, please ignore my comment.  Except this, the code looks good
> > to me.
> 
> Not really discussed on the list, but that was present in v18 (though
> not functional because .sched_task() hook wasn't actually enabled) and
> Mark removed it. His work is here[1].The only comment was:
> 
> Note: saving/restoring at context-switch doesn't interact well with
> event rotation (e.g. if filters change)

In the brbe_enable() function, it "Merge the permitted branch filters
of all events".  Based on current implementation, all events share the
same branch filter.

When event rotation happens, if without context switch, in theory we
should can directly use the branch record (no invalidation, no injection)
for all events.

For a context-switch case, we need to save and re-inject branch record.
BRBE record sticks to a process context, no matter what events have been
enabled.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-24 14:03       ` Leo Yan
@ 2025-02-24 16:05         ` Mark Rutland
  2025-02-24 18:03           ` Leo Yan
  0 siblings, 1 reply; 33+ messages in thread
From: Mark Rutland @ 2025-02-24 16:05 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rob Herring, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Mon, Feb 24, 2025 at 02:03:17PM +0000, Leo Yan wrote:
> On Mon, Feb 24, 2025 at 06:46:35AM -0600, Rob Herring wrote:
> > On Mon, Feb 24, 2025 at 6:25 AM Leo Yan <leo.yan@arm.com> wrote:
> > > On Tue, Feb 18, 2025 at 02:40:06PM -0600, Rob Herring (Arm) wrote:
> > > >
> > > > From: Anshuman Khandual <anshuman.khandual@arm.com>
> > >
> > > [...]
> > >
> > > > BRBE records are invalidated whenever events are reconfigured, a new
> > > > task is scheduled in, or after recording is paused (and the records
> > > > have been recorded for the event). The architecture allows branch
> > > > records to be invalidated by the PE under implementation defined
> > > > conditions. It is expected that these conditions are rare.
> > >
> > > [...]
> > >
> > > > +static void armv8pmu_sched_task(struct perf_event_pmu_context *pmu_ctx, bool sched_in)
> > > > +{
> > > > +       struct arm_pmu *armpmu = *this_cpu_ptr(&cpu_armpmu);
> > > > +       struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events);
> > > > +
> > > > +       if (!hw_events->branch_users)
> > > > +               return;
> > > > +
> > > > +       if (sched_in)
> > > > +               brbe_invalidate();
> > > > +}
> > >
> > > Just a minor concern.  I don't see any handling for task migration.
> > > E.g., for a task is migrated from one CPU to another CPU, I expect we
> > > need to save and restore branch records based on BRBE injection.  So
> > > far, the driver simply invalidates all records.
> > >
> > > I think this topic is very likely discussed before.  If this is the
> > > case, please ignore my comment.  Except this, the code looks good
> > > to me.
> > 
> > Not really discussed on the list, but that was present in v18 (though
> > not functional because .sched_task() hook wasn't actually enabled) and
> > Mark removed it. His work is here[1].The only comment was:
> > 
> > Note: saving/restoring at context-switch doesn't interact well with
> > event rotation (e.g. if filters change)
> 
> In the brbe_enable() function, it "Merge the permitted branch filters
> of all events".  Based on current implementation, all events share the
> same branch filter.

Critically, the brbe_enable() function merges the filters of all
*active* events which have been installed into hardware. It does not
track all events which can be rotated, and the resulting filter is not
the same -- it can change as a result of rotation.

> When event rotation happens, if without context switch, in theory we
> should can directly use the branch record (no invalidation, no injection)
> for all events.

No; that only works in *some* cases, and will produce incorrect results
in others.

For example, consider filtering. Imagine a PMU with a single counter,
and two events, where event-A filters for calls-and-returns and event-B
filters for calls-only. When switching from event-A to event-B, it's
theoretically possible to keep the existing records around, knowing that
the returns can be filtered out later. When switching from event-B to
event-A we cannot keep the existing records, since there are gaps
whenever a return should have been recorded.

There are a number of cases of that shape given the set of configurable
filters. In theory it's possible to retain those in some cases, but I
don't think that the complexity is justified.

Similarly, whenever kernel branches are recorded it's necessary to drop
the stale branches whenever branch recording is paused, as there's
necessarily a blackout period and hence a gap in the records.

Do you think that you have a case where losing branches across rotation
*really* matters?

> For a context-switch case, we need to save and re-inject branch record.
> BRBE record sticks to a process context, no matter what events have been
> enabled.

I had originally wanted to keep per-event records around, but it doesn't
work in all cases. One reason events get discarded at context-switch
time is that CPU-bound events can sample branches, and would
mis-attribute stale userspace branches to the wrong context when
switching tasks. There are explicit comments about this in
amd_pmu_brs_sched_task() and intel_pmu_lbr_sched_task().

Given we discard records when reprogramming events, we *could* try to
preserve events in some cases, but I suspect that as with the rotation
case this'll be a lot of complexity for little gain. Note that as we
discard events when enabling the PMU, we'd throw some task-bound records
away anyway, and practically the gain would be limited to cpu-bound
records.

Do you have a reason why you think we *must* keep events around?

Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-24 16:05         ` Mark Rutland
@ 2025-02-24 18:03           ` Leo Yan
  2025-02-25  1:31             ` Rob Herring
  2025-02-25 12:01             ` Mark Rutland
  0 siblings, 2 replies; 33+ messages in thread
From: Leo Yan @ 2025-02-24 18:03 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Rob Herring, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Mon, Feb 24, 2025 at 04:05:43PM +0000, Mark Rutland wrote:

[...]

> > > > Just a minor concern.  I don't see any handling for task migration.
> > > > E.g., for a task is migrated from one CPU to another CPU, I expect we
> > > > need to save and restore branch records based on BRBE injection.  So
> > > > far, the driver simply invalidates all records.
> > > >
> > > > I think this topic is very likely discussed before.  If this is the
> > > > case, please ignore my comment.  Except this, the code looks good
> > > > to me.
> > > 
> > > Not really discussed on the list, but that was present in v18 (though
> > > not functional because .sched_task() hook wasn't actually enabled) and
> > > Mark removed it. His work is here[1].The only comment was:
> > > 
> > > Note: saving/restoring at context-switch doesn't interact well with
> > > event rotation (e.g. if filters change)
> > 
> > In the brbe_enable() function, it "Merge the permitted branch filters
> > of all events".  Based on current implementation, all events share the
> > same branch filter.
> 
> Critically, the brbe_enable() function merges the filters of all
> *active* events which have been installed into hardware. It does not
> track all events which can be rotated, and the resulting filter is not
> the same -- it can change as a result of rotation.

In a perf session has multiple events, and events have different branch
filters, seems to me, a simple way is to return error for this case.

I would argue BRBE is an IP for recording branches per CPU wise, it does
not support recording for event wise.  If we can unify branch filter
within a perf session, would this be much easier for handling?

> > When event rotation happens, if without context switch, in theory we
> > should can directly use the branch record (no invalidation, no injection)
> > for all events.
> 
> No; that only works in *some* cases, and will produce incorrect results
> in others.
> 
> For example, consider filtering. Imagine a PMU with a single counter,
> and two events, where event-A filters for calls-and-returns and event-B
> filters for calls-only. When switching from event-A to event-B, it's
> theoretically possible to keep the existing records around, knowing that
> the returns can be filtered out later. When switching from event-B to
> event-A we cannot keep the existing records, since there are gaps
> whenever a return should have been recorded.

Seems to me, the problem is not caused by event rotation.  We need to
calculate a correct filter in the first place - the BRBE driver should
calculate a superset for all filters of events for a session.  Then,
generate branch record based event's specific filter.

> There are a number of cases of that shape given the set of configurable
> filters. In theory it's possible to retain those in some cases, but I
> don't think that the complexity is justified.
> 
> Similarly, whenever kernel branches are recorded it's necessary to drop
> the stale branches whenever branch recording is paused, as there's
> necessarily a blackout period and hence a gap in the records.

If we save BRBE record when a process is switched out and then restore
the record when a process is switched in, should we can keep a decent
branch record for performance profiling?

I understand it might be many things happen in the middle of a task
switching or migration, but it is fine for not recording branches during
the blackout period.  The missed branch records are not very helpful for
forming a flow for a profiled program itself, especially, if we
consider we will optimize userspace program in many cases.

> Do you think that you have a case where losing branches across rotation
> *really* matters?

I agreed that event rotation case might be rare and complex.  Please see
a comment below.

> > For a context-switch case, we need to save and re-inject branch record.
> > BRBE record sticks to a process context, no matter what events have been
> > enabled.
> 
> I had originally wanted to keep per-event records around, but it doesn't
> work in all cases. One reason events get discarded at context-switch
> time is that CPU-bound events can sample branches, and would
> mis-attribute stale userspace branches to the wrong context when
> switching tasks. There are explicit comments about this in
> amd_pmu_brs_sched_task() and intel_pmu_lbr_sched_task().
> 
> Given we discard records when reprogramming events, we *could* try to
> preserve events in some cases, but I suspect that as with the rotation
> case this'll be a lot of complexity for little gain. Note that as we
> discard events when enabling the PMU, we'd throw some task-bound records
> away anyway, and practically the gain would be limited to cpu-bound
> records.
> 
> Do you have a reason why you think we *must* keep events around?

Here I am really concerned are cases when a process is preempted or
migrated.  The driver doesn't save and restore branch records for these
cases, it just invalidates all records when a task is scheduled in.

As a result, if an event overflow is close to context switch, it is
likely to capture incomplete branch records.  For a userspace-only
tracing, it is risk to capture empty branch record after preemption
and migrations.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-24 18:03           ` Leo Yan
@ 2025-02-25  1:31             ` Rob Herring
  2025-02-25 12:38               ` Leo Yan
  2025-02-25 12:01             ` Mark Rutland
  1 sibling, 1 reply; 33+ messages in thread
From: Rob Herring @ 2025-02-25  1:31 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Mon, Feb 24, 2025 at 12:03 PM Leo Yan <leo.yan@arm.com> wrote:
>
> On Mon, Feb 24, 2025 at 04:05:43PM +0000, Mark Rutland wrote:
>
> [...]
>
> > > > > Just a minor concern.  I don't see any handling for task migration.
> > > > > E.g., for a task is migrated from one CPU to another CPU, I expect we
> > > > > need to save and restore branch records based on BRBE injection.  So
> > > > > far, the driver simply invalidates all records.
> > > > >
> > > > > I think this topic is very likely discussed before.  If this is the
> > > > > case, please ignore my comment.  Except this, the code looks good
> > > > > to me.
> > > >
> > > > Not really discussed on the list, but that was present in v18 (though
> > > > not functional because .sched_task() hook wasn't actually enabled) and
> > > > Mark removed it. His work is here[1].The only comment was:
> > > >
> > > > Note: saving/restoring at context-switch doesn't interact well with
> > > > event rotation (e.g. if filters change)
> > >
> > > In the brbe_enable() function, it "Merge the permitted branch filters
> > > of all events".  Based on current implementation, all events share the
> > > same branch filter.
> >
> > Critically, the brbe_enable() function merges the filters of all
> > *active* events which have been installed into hardware. It does not
> > track all events which can be rotated, and the resulting filter is not
> > the same -- it can change as a result of rotation.
>
> In a perf session has multiple events, and events have different branch
> filters, seems to me, a simple way is to return error for this case.
>
> I would argue BRBE is an IP for recording branches per CPU wise, it does
> not support recording for event wise.  If we can unify branch filter
> within a perf session, would this be much easier for handling?

And the PMU is an IP for recording events per CPU, but that is not
perf (only)...

>
> > > When event rotation happens, if without context switch, in theory we
> > > should can directly use the branch record (no invalidation, no injection)
> > > for all events.
> >
> > No; that only works in *some* cases, and will produce incorrect results
> > in others.
> >
> > For example, consider filtering. Imagine a PMU with a single counter,
> > and two events, where event-A filters for calls-and-returns and event-B
> > filters for calls-only. When switching from event-A to event-B, it's
> > theoretically possible to keep the existing records around, knowing that
> > the returns can be filtered out later. When switching from event-B to
> > event-A we cannot keep the existing records, since there are gaps
> > whenever a return should have been recorded.
>
> Seems to me, the problem is not caused by event rotation.  We need to
> calculate a correct filter in the first place - the BRBE driver should
> calculate a superset for all filters of events for a session.  Then,
> generate branch record based event's specific filter.

The driver doesn't have enough information. If it is told to schedule
event A, it doesn't know anything about event B. It could in theory
try to remember event B if event B had already been scheduled, but it
never knows when event B is gone.

> > There are a number of cases of that shape given the set of configurable
> > filters. In theory it's possible to retain those in some cases, but I
> > don't think that the complexity is justified.
> >
> > Similarly, whenever kernel branches are recorded it's necessary to drop
> > the stale branches whenever branch recording is paused, as there's
> > necessarily a blackout period and hence a gap in the records.
>
> If we save BRBE record when a process is switched out and then restore
> the record when a process is switched in, should we can keep a decent
> branch record for performance profiling?

Keep in mind that there's only 64 branches recorded at most. How many
branches in a context switch plus reconfiguring the PMU? Not a small
percentage of 64 I think. In traces where freeze on overflow was not
working (there's an example in v18), just the interrupt entry until
BRBE was stopped was a significant part of the trace. A context switch
is going to be similar.

> I understand it might be many things happen in the middle of a task
> switching or migration, but it is fine for not recording branches during
> the blackout period.  The missed branch records are not very helpful for
> forming a flow for a profiled program itself, especially, if we
> consider we will optimize userspace program in many cases.

We can't really not record during a blackout period. The only way we
can is with freeze on overflow.

> > Do you think that you have a case where losing branches across rotation
> > *really* matters?
>
> I agreed that event rotation case might be rare and complex.  Please see
> a comment below.
>
> > > For a context-switch case, we need to save and re-inject branch record.
> > > BRBE record sticks to a process context, no matter what events have been
> > > enabled.
> >
> > I had originally wanted to keep per-event records around, but it doesn't
> > work in all cases. One reason events get discarded at context-switch
> > time is that CPU-bound events can sample branches, and would
> > mis-attribute stale userspace branches to the wrong context when
> > switching tasks. There are explicit comments about this in
> > amd_pmu_brs_sched_task() and intel_pmu_lbr_sched_task().
> >
> > Given we discard records when reprogramming events, we *could* try to
> > preserve events in some cases, but I suspect that as with the rotation
> > case this'll be a lot of complexity for little gain. Note that as we
> > discard events when enabling the PMU, we'd throw some task-bound records
> > away anyway, and practically the gain would be limited to cpu-bound
> > records.
> >
> > Do you have a reason why you think we *must* keep events around?
>
> Here I am really concerned are cases when a process is preempted or
> migrated.  The driver doesn't save and restore branch records for these
> cases, it just invalidates all records when a task is scheduled in.
>
> As a result, if an event overflow is close to context switch, it is
> likely to capture incomplete branch records.  For a userspace-only
> tracing, it is risk to capture empty branch record after preemption
> and migrations.

There's the same risk if something else is recording kernel branches
when you are recording userspace only. I think the user has to be
aware if other things like context switches are perturbing their data.

Rob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-24 18:03           ` Leo Yan
  2025-02-25  1:31             ` Rob Herring
@ 2025-02-25 12:01             ` Mark Rutland
  2025-02-25 17:48               ` Leo Yan
  1 sibling, 1 reply; 33+ messages in thread
From: Mark Rutland @ 2025-02-25 12:01 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rob Herring, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Mon, Feb 24, 2025 at 06:03:01PM +0000, Leo Yan wrote:
> On Mon, Feb 24, 2025 at 04:05:43PM +0000, Mark Rutland wrote:
> 
> [...]
> 
> > > > > Just a minor concern.  I don't see any handling for task migration.
> > > > > E.g., for a task is migrated from one CPU to another CPU, I expect we
> > > > > need to save and restore branch records based on BRBE injection.  So
> > > > > far, the driver simply invalidates all records.
> > > > >
> > > > > I think this topic is very likely discussed before.  If this is the
> > > > > case, please ignore my comment.  Except this, the code looks good
> > > > > to me.
> > > > 
> > > > Not really discussed on the list, but that was present in v18 (though
> > > > not functional because .sched_task() hook wasn't actually enabled) and
> > > > Mark removed it. His work is here[1].The only comment was:
> > > > 
> > > > Note: saving/restoring at context-switch doesn't interact well with
> > > > event rotation (e.g. if filters change)
> > > 
> > > In the brbe_enable() function, it "Merge the permitted branch filters
> > > of all events".  Based on current implementation, all events share the
> > > same branch filter.
> > 
> > Critically, the brbe_enable() function merges the filters of all
> > *active* events which have been installed into hardware. It does not
> > track all events which can be rotated, and the resulting filter is not
> > the same -- it can change as a result of rotation.
> 
> In a perf session has multiple events, and events have different branch
> filters, seems to me, a simple way is to return error for this case.

FWIW, I'd generally prefer to do that since it avoids a number of
horrible edge-cases and gets rid of the need to do SW filtering, which
falls somewhere between "tricky" and "not entirely possible". However,
that's not what LBR and others do, which is why we went with filter
merging.

If folk on the tools side are happy with the kernel rejecting
conflicting events, then I'd be more than happy to do that. What I don't
want is that we start off with that approach and people immediately
start to complain that the BRBE driver rejects events that the LBR
driver accepts.

See the last time this came up:

  https://lore.kernel.org/linux-arm-kernel/Zli6Ot0TwK3Qy-ee@J2N7QTR9R3/
  https://lore.kernel.org/linux-arm-kernel/ZlnKFFwv9612V81h@J2N7QTR9R3/

> I would argue BRBE is an IP for recording branches per CPU wise, it does
> not support recording for event wise.

Yes, there is a mismatch between the hardware and the perf ABI.

> If we can unify branch filter within a perf session, would this be
> much easier for handling?

Do you mean if the perf tool ensured that all events in a given session
had the same filter? From the kernel's PoV there's no such thing as a
"perf session", and I'm not sure whether you're suggesting doing that in
userspace or withing the kernel.

Doing that in the perf tool would certianly make a stronger argument for
the kernel taking the "reject conflicting branch filters" option.

Doing that within the kernel isn't really possible.

> > > When event rotation happens, if without context switch, in theory we
> > > should can directly use the branch record (no invalidation, no injection)
> > > for all events.
> > 
> > No; that only works in *some* cases, and will produce incorrect results
> > in others.
> > 
> > For example, consider filtering. Imagine a PMU with a single counter,
> > and two events, where event-A filters for calls-and-returns and event-B
> > filters for calls-only. When switching from event-A to event-B, it's
> > theoretically possible to keep the existing records around, knowing that
> > the returns can be filtered out later. When switching from event-B to
> > event-A we cannot keep the existing records, since there are gaps
> > whenever a return should have been recorded.
> 
> Seems to me, the problem is not caused by event rotation.  We need to
> calculate a correct filter in the first place - the BRBE driver should
> calculate a superset for all filters of events for a session.  Then,
> generate branch record based event's specific filter.

From the kernel's PoV there is no such thing as a perf session, and the
branch filters are per-event per the perf ABI.

We only really have two options:

(1) Reject conflicting filters when scheduling events. At event open
    time we have ot check whether an entire group is actually
    schedulable.

(2) Merge filters when scheduling events, and then filter out
    branches which don't match a particular event's filters when reading
    the branches.

> > There are a number of cases of that shape given the set of configurable
> > filters. In theory it's possible to retain those in some cases, but I
> > don't think that the complexity is justified.
> > 
> > Similarly, whenever kernel branches are recorded it's necessary to drop
> > the stale branches whenever branch recording is paused, as there's
> > necessarily a blackout period and hence a gap in the records.
> 
> If we save BRBE record when a process is switched out and then restore
> the record when a process is switched in, should we can keep a decent
> branch record for performance profiling?

I cannot parse this question. What are you trying to suggest here?

> I understand it might be many things happen in the middle of a task
> switching or migration, but it is fine for not recording branches during
> the blackout period.  The missed branch records are not very helpful for
> forming a flow for a profiled program itself, especially, if we
> consider we will optimize userspace program in many cases.

Just to be clear, you're talking about userspace specifically, right?

There are users that want a contiguous set of branches, which is what
hardware tries to guarantee, and that's what LBR tries to gaurantee
today, so I don't think that we can say gaps are always fine.

If we want a "give me as many arbitrary samples branches as possible,
with arbitrary potential gaps" option, I'm not necessarily opposed to
that, but I do not think that should be the default behaviour.

> > Do you think that you have a case where losing branches across rotation
> > *really* matters?
> 
> I agreed that event rotation case might be rare and complex.  Please see
> a comment below.
> 
> > > For a context-switch case, we need to save and re-inject branch record.
> > > BRBE record sticks to a process context, no matter what events have been
> > > enabled.
> > 
> > I had originally wanted to keep per-event records around, but it doesn't
> > work in all cases. One reason events get discarded at context-switch
> > time is that CPU-bound events can sample branches, and would
> > mis-attribute stale userspace branches to the wrong context when
> > switching tasks. There are explicit comments about this in
> > amd_pmu_brs_sched_task() and intel_pmu_lbr_sched_task().
> > 
> > Given we discard records when reprogramming events, we *could* try to
> > preserve events in some cases, but I suspect that as with the rotation
> > case this'll be a lot of complexity for little gain. Note that as we
> > discard events when enabling the PMU, we'd throw some task-bound records
> > away anyway, and practically the gain would be limited to cpu-bound
> > records.
> > 
> > Do you have a reason why you think we *must* keep events around?
> 
> Here I am really concerned are cases when a process is preempted or
> migrated.  The driver doesn't save and restore branch records for these
> cases, it just invalidates all records when a task is scheduled in.
> 
> As a result, if an event overflow is close to context switch, it is
> likely to capture incomplete branch records.  For a userspace-only
> tracing, it is risk to capture empty branch record after preemption
> and migrations.

Yes, and I was initially concerned about that, but is that really a
problem? So long as the event period doesn't *always* coincide with
preemption you'll get records from other samples anyway.

While I agree this is a theoretical concern, I don't think its of
practical importance. This discarding happens on x86 *today* with LBR,
and (AFAICT) there hav been no complaints. Note that the LBR logic to
save/restore records for task contexts is only done for user callstack
recording, which BRBE does not support.

Regardless, there's still the problematic interaction with event
rotation; you cannot save/restore the records safely here if events (and
the relevant filters) can change between the save and restore.

Discarding here is significantly simpler, and correct.

Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-25  1:31             ` Rob Herring
@ 2025-02-25 12:38               ` Leo Yan
  2025-02-25 15:35                 ` Rob Herring
  2025-02-25 19:46                 ` Mark Rutland
  0 siblings, 2 replies; 33+ messages in thread
From: Leo Yan @ 2025-02-25 12:38 UTC (permalink / raw)
  To: Rob Herring
  Cc: Mark Rutland, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Mon, Feb 24, 2025 at 07:31:52PM -0600, Rob Herring wrote:

[...]

> > > > When event rotation happens, if without context switch, in theory we
> > > > should can directly use the branch record (no invalidation, no injection)
> > > > for all events.
> > >
> > > No; that only works in *some* cases, and will produce incorrect results
> > > in others.
> > >
> > > For example, consider filtering. Imagine a PMU with a single counter,
> > > and two events, where event-A filters for calls-and-returns and event-B
> > > filters for calls-only. When switching from event-A to event-B, it's
> > > theoretically possible to keep the existing records around, knowing that
> > > the returns can be filtered out later. When switching from event-B to
> > > event-A we cannot keep the existing records, since there are gaps
> > > whenever a return should have been recorded.
> >
> > Seems to me, the problem is not caused by event rotation.  We need to
> > calculate a correct filter in the first place - the BRBE driver should
> > calculate a superset for all filters of events for a session.  Then,
> > generate branch record based event's specific filter.
> 
> The driver doesn't have enough information. If it is told to schedule
> event A, it doesn't know anything about event B. It could in theory
> try to remember event B if event B had already been scheduled, but it
> never knows when event B is gone.

E.g., I tried below command for enabling 10 events in a perf session:

  perf record -e armv9_nevis/r04/ -e armv9_nevis/r05/ \
              -e armv9_nevis/r06/ -e armv9_nevis/r07/ \
              -e armv9_nevis/r08/ -e armv9_nevis/r09/ \
              -e armv9_nevis/r10/ -e armv9_nevis/r11/ \
              -e armv9_nevis/r12/ -e armv9_nevis/r13/ \
              -- sleep 1

For Arm PMU, the flow below is invoked for every event on every
affinied CPU in initialization phase:

  armpmu_event_init() {
    armv8pmu_set_event_filter();
  }

Shouldn't we calculate a superset branch filter for all events, store
it into a per-CPU data structure and then apply the filter on BRBE?

> > > There are a number of cases of that shape given the set of configurable
> > > filters. In theory it's possible to retain those in some cases, but I
> > > don't think that the complexity is justified.
> > >
> > > Similarly, whenever kernel branches are recorded it's necessary to drop
> > > the stale branches whenever branch recording is paused, as there's
> > > necessarily a blackout period and hence a gap in the records.
> >
> > If we save BRBE record when a process is switched out and then restore
> > the record when a process is switched in, should we can keep a decent
> > branch record for performance profiling?
> 
> Keep in mind that there's only 64 branches recorded at most. How many
> branches in a context switch plus reconfiguring the PMU? Not a small
> percentage of 64 I think. In traces where freeze on overflow was not
> working (there's an example in v18), just the interrupt entry until
> BRBE was stopped was a significant part of the trace. A context switch
> is going to be similar.

That is true for kernel mode enabled tracing.  But we will have no
such kind noises for userspace only mode tracing.

[...]

> > > Do you have a reason why you think we *must* keep events around?
> >
> > Here I am really concerned are cases when a process is preempted or
> > migrated.  The driver doesn't save and restore branch records for these
> > cases, it just invalidates all records when a task is scheduled in.
> >
> > As a result, if an event overflow is close to context switch, it is
> > likely to capture incomplete branch records.  For a userspace-only
> > tracing, it is risk to capture empty branch record after preemption
> > and migrations.
> 
> There's the same risk if something else is recording kernel branches
> when you are recording userspace only. I think the user has to be
> aware if other things like context switches are perturbing their data.

I am confused for the decription above.  Does it refer to branch
recording cross different sessions?  It is fine for me that the branch
data is interleaved by different sessions (e.g. one is global tracing
and another is only per-thread tracing).

We might need to consider an intact branch record for the single perf
session case.  E.g. if userspace program calls:

    func_a -> func_b -> func_c

In a case for only userspace tracing, we will have no chance to preserve
the call sequence of these functions after the program is switched out.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-25 12:38               ` Leo Yan
@ 2025-02-25 15:35                 ` Rob Herring
  2025-02-25 19:46                 ` Mark Rutland
  1 sibling, 0 replies; 33+ messages in thread
From: Rob Herring @ 2025-02-25 15:35 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Tue, Feb 25, 2025 at 6:38 AM Leo Yan <leo.yan@arm.com> wrote:
>
> On Mon, Feb 24, 2025 at 07:31:52PM -0600, Rob Herring wrote:
>
> [...]
>
> > > > > When event rotation happens, if without context switch, in theory we
> > > > > should can directly use the branch record (no invalidation, no injection)
> > > > > for all events.
> > > >
> > > > No; that only works in *some* cases, and will produce incorrect results
> > > > in others.
> > > >
> > > > For example, consider filtering. Imagine a PMU with a single counter,
> > > > and two events, where event-A filters for calls-and-returns and event-B
> > > > filters for calls-only. When switching from event-A to event-B, it's
> > > > theoretically possible to keep the existing records around, knowing that
> > > > the returns can be filtered out later. When switching from event-B to
> > > > event-A we cannot keep the existing records, since there are gaps
> > > > whenever a return should have been recorded.
> > >
> > > Seems to me, the problem is not caused by event rotation.  We need to
> > > calculate a correct filter in the first place - the BRBE driver should
> > > calculate a superset for all filters of events for a session.  Then,
> > > generate branch record based event's specific filter.
> >
> > The driver doesn't have enough information. If it is told to schedule
> > event A, it doesn't know anything about event B. It could in theory
> > try to remember event B if event B had already been scheduled, but it
> > never knows when event B is gone.
>
> E.g., I tried below command for enabling 10 events in a perf session:
>
>   perf record -e armv9_nevis/r04/ -e armv9_nevis/r05/ \
>               -e armv9_nevis/r06/ -e armv9_nevis/r07/ \
>               -e armv9_nevis/r08/ -e armv9_nevis/r09/ \
>               -e armv9_nevis/r10/ -e armv9_nevis/r11/ \
>               -e armv9_nevis/r12/ -e armv9_nevis/r13/ \
>               -- sleep 1
>
> For Arm PMU, the flow below is invoked for every event on every
> affinied CPU in initialization phase:
>
>   armpmu_event_init() {
>     armv8pmu_set_event_filter();
>   }

That function is passed *1* event. It is not intended to go looking at
all events or muck with any global state. Could we go poking around
all the data structures? Probably, it's C and data structures are
often not opaque when they should be, so that wouldn't be a good idea.
If you think it is, I'd recommend you stay away from Rust.

Furthermore, an event here may not actually be enabled. A user could
open 2 events and handle them as mutually exclusive continuously
disabling one and enabling the other. If the branch filters that they
want are not overlapping, we'd be reducing our effective branch record
size. Maximizing the size seems much more important to me than keeping
some branches in a few corner cases.

> Shouldn't we calculate a superset branch filter for all events, store
> it into a per-CPU data structure and then apply the filter on BRBE?

Suppose we do that, what happens when a 2nd session (as Mark pointed
out, sessions only exist in the perf tool, not the kernel) adds more
events. We configured the filters and now we have to change them
again. So we have to invalidate the branch record. It's the same thing
with event rotation (though less frequent (probably)).

> > > > There are a number of cases of that shape given the set of configurable
> > > > filters. In theory it's possible to retain those in some cases, but I
> > > > don't think that the complexity is justified.
> > > >
> > > > Similarly, whenever kernel branches are recorded it's necessary to drop
> > > > the stale branches whenever branch recording is paused, as there's
> > > > necessarily a blackout period and hence a gap in the records.
> > >
> > > If we save BRBE record when a process is switched out and then restore
> > > the record when a process is switched in, should we can keep a decent
> > > branch record for performance profiling?
> >
> > Keep in mind that there's only 64 branches recorded at most. How many
> > branches in a context switch plus reconfiguring the PMU? Not a small
> > percentage of 64 I think. In traces where freeze on overflow was not
> > working (there's an example in v18), just the interrupt entry until
> > BRBE was stopped was a significant part of the trace. A context switch
> > is going to be similar.
>
> That is true for kernel mode enabled tracing.  But we will have no
> such kind noises for userspace only mode tracing.
>
> [...]
>
> > > > Do you have a reason why you think we *must* keep events around?
> > >
> > > Here I am really concerned are cases when a process is preempted or
> > > migrated.  The driver doesn't save and restore branch records for these
> > > cases, it just invalidates all records when a task is scheduled in.
> > >
> > > As a result, if an event overflow is close to context switch, it is
> > > likely to capture incomplete branch records.  For a userspace-only
> > > tracing, it is risk to capture empty branch record after preemption
> > > and migrations.
> >
> > There's the same risk if something else is recording kernel branches
> > when you are recording userspace only. I think the user has to be
> > aware if other things like context switches are perturbing their data.
>
> I am confused for the decription above.  Does it refer to branch
> recording cross different sessions?  It is fine for me that the branch
> data is interleaved by different sessions (e.g. one is global tracing
> and another is only per-thread tracing).
>
> We might need to consider an intact branch record for the single perf
> session case.  E.g. if userspace program calls:
>
>     func_a -> func_b -> func_c
>
> In a case for only userspace tracing, we will have no chance to preserve
> the call sequence of these functions after the program is switched out.

So you miss the few times that happens in a context switch. But we are
sampling and all/most of the other samples are going to be fine. How
much is that really going to affect your profile?

Rob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-25 12:01             ` Mark Rutland
@ 2025-02-25 17:48               ` Leo Yan
  2025-02-25 19:04                 ` Rob Herring
  2025-02-25 19:58                 ` Mark Rutland
  0 siblings, 2 replies; 33+ messages in thread
From: Leo Yan @ 2025-02-25 17:48 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Rob Herring, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Tue, Feb 25, 2025 at 12:01:52PM +0000, Mark Rutland wrote:

[...]

> > > Critically, the brbe_enable() function merges the filters of all
> > > *active* events which have been installed into hardware. It does not
> > > track all events which can be rotated, and the resulting filter is not
> > > the same -- it can change as a result of rotation.
> > 
> > In a perf session has multiple events, and events have different branch
> > filters, seems to me, a simple way is to return error for this case.
> 
> FWIW, I'd generally prefer to do that since it avoids a number of
> horrible edge-cases and gets rid of the need to do SW filtering, which
> falls somewhere between "tricky" and "not entirely possible". However,
> that's not what LBR and others do, which is why we went with filter
> merging.
> 
> If folk on the tools side are happy with the kernel rejecting
> conflicting events, then I'd be more than happy to do that. What I don't
> want is that we start off with that approach and people immediately
> start to complain that the BRBE driver rejects events that the LBR
> driver accepts.
> 
> See the last time this came up.

Thanks for the shared links.  Based on the info, let's say we can have two
cases:

  Case 1: set different branch filters in a single perf session:

    perf record -e armv8_pmuv3_0/r03,branch_type=any_call/u \
                -e armv8_pmuv3_0/r04,branch_type=any_ret/k ...

  Case 2: set different branch filters in multiple perf sessions:

    perf record -e armv8_pmuv3_0/r03,branch_type=any_call/u ...

    perf record -e armv8_pmuv3_0/r04,branch_type=any_ret/k ...

In my previous reply, I was suggesting that we should reject the case 1.
IMO, it is not quite useful to configure different filters for events in
the same session, especially if this leads complexity in the driver due
to the hardware limitation.

For case 2, when create a new session, if the perf tool can read out the
current branch filter setting (e.g. via sysfs node) and give suggestion
what branch filter is compabile with existed sessions, seems to me, this
is a feasible solution.  My understanding this is a rare case, and a
clear guidance for users would be sufficient if this happens.  (Maybe
we can give recommendation for how to use BRBE in the perf doc).

To be clear, an important factor is the trace modes with modifier 'u'
(user) and 'k' (kernel) should be supported for different events and for
different sessions.  In a mixed cases (some events are userspace only
and some are kernel only), the BRBE driver needs to filter out branch
records for specific mode when taking a sample.

> > If we can unify branch filter within a perf session, would this be
> > much easier for handling?
> 
> Do you mean if the perf tool ensured that all events in a given session
> had the same filter? From the kernel's PoV there's no such thing as a
> "perf session", and I'm not sure whether you're suggesting doing that in
> userspace or withing the kernel.

My understanding is this would be not difficult to do such kind checking
in the tool.  E.g., the perf tool can iterate every event and check the
branch filter and detect incompabile issue.

> Doing that in the perf tool would certianly make a stronger argument for
> the kernel taking the "reject conflicting branch filters" option.
> 
> Doing that within the kernel isn't really possible.

As said above, if the BRBE driver can provide a knob in sysfs to indicate
what is the current branch filter in the existed sessions, this would be
helpful for the tool to do the checking and remind users.

I haven't done any experiments for this. If you think this is the way
to move forward, I might do a prototype and get back to you to ensure we
don't run into any unexpected issues.

[...]

To make the discussion easier, I would like reply separately regarding
the branch record save and restore issue.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-25 17:48               ` Leo Yan
@ 2025-02-25 19:04                 ` Rob Herring
  2025-02-25 19:58                 ` Mark Rutland
  1 sibling, 0 replies; 33+ messages in thread
From: Rob Herring @ 2025-02-25 19:04 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Tue, Feb 25, 2025 at 11:48 AM Leo Yan <leo.yan@arm.com> wrote:
>
> On Tue, Feb 25, 2025 at 12:01:52PM +0000, Mark Rutland wrote:
>
> [...]
>
> > > > Critically, the brbe_enable() function merges the filters of all
> > > > *active* events which have been installed into hardware. It does not
> > > > track all events which can be rotated, and the resulting filter is not
> > > > the same -- it can change as a result of rotation.
> > >
> > > In a perf session has multiple events, and events have different branch
> > > filters, seems to me, a simple way is to return error for this case.
> >
> > FWIW, I'd generally prefer to do that since it avoids a number of
> > horrible edge-cases and gets rid of the need to do SW filtering, which
> > falls somewhere between "tricky" and "not entirely possible". However,
> > that's not what LBR and others do, which is why we went with filter
> > merging.
> >
> > If folk on the tools side are happy with the kernel rejecting
> > conflicting events, then I'd be more than happy to do that. What I don't
> > want is that we start off with that approach and people immediately
> > start to complain that the BRBE driver rejects events that the LBR
> > driver accepts.
> >
> > See the last time this came up.
>
> Thanks for the shared links.  Based on the info, let's say we can have two
> cases:
>
>   Case 1: set different branch filters in a single perf session:
>
>     perf record -e armv8_pmuv3_0/r03,branch_type=any_call/u \
>                 -e armv8_pmuv3_0/r04,branch_type=any_ret/k ...
>
>   Case 2: set different branch filters in multiple perf sessions:
>
>     perf record -e armv8_pmuv3_0/r03,branch_type=any_call/u ...
>
>     perf record -e armv8_pmuv3_0/r04,branch_type=any_ret/k ...
>
> In my previous reply, I was suggesting that we should reject the case 1.

The driver cannot distinguish those 2 cases.

> IMO, it is not quite useful to configure different filters for events in
> the same session, especially if this leads complexity in the driver due
> to the hardware limitation.
>
> For case 2, when create a new session, if the perf tool can read out the
> current branch filter setting (e.g. via sysfs node) and give suggestion
> what branch filter is compabile with existed sessions, seems to me, this
> is a feasible solution.  My understanding this is a rare case, and a
> clear guidance for users would be sufficient if this happens.  (Maybe
> we can give recommendation for how to use BRBE in the perf doc).

First, I don't think anything currently in sysfs for PMU changes based
on current PMU usage. It is all static features. So you just added a
2nd control interface in addition to the syscall/ioctl interface. It
is also totally racy. As soon as you read sysfs, the information could
be out of date because an event was added or removed.

Second, that is completely different from how x86 works. Folks don't
want to know how to use BRBE. They want to do perf branch stack
recording like they already do on existing platforms. That's what has
been implemented here with the behavior as close as possible even for
corner cases that seem questionable. For the userspace counter access
support, folks were upset that it has to be explicitly enabled (in
sysctl) and requested (in a configX bit) when you don't on x86. People
notice and care if the behavior is different.

> To be clear, an important factor is the trace modes with modifier 'u'
> (user) and 'k' (kernel) should be supported for different events and for
> different sessions.  In a mixed cases (some events are userspace only
> and some are kernel only), the BRBE driver needs to filter out branch
> records for specific mode when taking a sample.
>
> > > If we can unify branch filter within a perf session, would this be
> > > much easier for handling?
> >
> > Do you mean if the perf tool ensured that all events in a given session
> > had the same filter? From the kernel's PoV there's no such thing as a
> > "perf session", and I'm not sure whether you're suggesting doing that in
> > userspace or withing the kernel.
>
> My understanding is this would be not difficult to do such kind checking
> in the tool.  E.g., the perf tool can iterate every event and check the
> branch filter and detect incompabile issue.

You could detect that in perf tool, but you can never do it in every
tool because anyone can write their own.

> > Doing that in the perf tool would certianly make a stronger argument for
> > the kernel taking the "reject conflicting branch filters" option.
> >
> > Doing that within the kernel isn't really possible.
>
> As said above, if the BRBE driver can provide a knob in sysfs to indicate
> what is the current branch filter in the existed sessions, this would be
> helpful for the tool to do the checking and remind users.
>
> I haven't done any experiments for this. If you think this is the way
> to move forward, I might do a prototype and get back to you to ensure we
> don't run into any unexpected issues.

I don't think anyone does. I think this whole discussion has gone into
the weeds.

Rob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-25 12:38               ` Leo Yan
  2025-02-25 15:35                 ` Rob Herring
@ 2025-02-25 19:46                 ` Mark Rutland
  1 sibling, 0 replies; 33+ messages in thread
From: Mark Rutland @ 2025-02-25 19:46 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rob Herring, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Tue, Feb 25, 2025 at 12:38:13PM +0000, Leo Yan wrote:
> On Mon, Feb 24, 2025 at 07:31:52PM -0600, Rob Herring wrote:
> 
> [...]
> 
> > > > > When event rotation happens, if without context switch, in theory we
> > > > > should can directly use the branch record (no invalidation, no injection)
> > > > > for all events.
> > > >
> > > > No; that only works in *some* cases, and will produce incorrect results
> > > > in others.
> > > >
> > > > For example, consider filtering. Imagine a PMU with a single counter,
> > > > and two events, where event-A filters for calls-and-returns and event-B
> > > > filters for calls-only. When switching from event-A to event-B, it's
> > > > theoretically possible to keep the existing records around, knowing that
> > > > the returns can be filtered out later. When switching from event-B to
> > > > event-A we cannot keep the existing records, since there are gaps
> > > > whenever a return should have been recorded.
> > >
> > > Seems to me, the problem is not caused by event rotation.  We need to
> > > calculate a correct filter in the first place - the BRBE driver should
> > > calculate a superset for all filters of events for a session.  Then,
> > > generate branch record based event's specific filter.
> > 
> > The driver doesn't have enough information. If it is told to schedule
> > event A, it doesn't know anything about event B. It could in theory
> > try to remember event B if event B had already been scheduled, but it
> > never knows when event B is gone.
> 
> E.g., I tried below command for enabling 10 events in a perf session:
> 
>   perf record -e armv9_nevis/r04/ -e armv9_nevis/r05/ \
>               -e armv9_nevis/r06/ -e armv9_nevis/r07/ \
>               -e armv9_nevis/r08/ -e armv9_nevis/r09/ \
>               -e armv9_nevis/r10/ -e armv9_nevis/r11/ \
>               -e armv9_nevis/r12/ -e armv9_nevis/r13/ \
>               -- sleep 1
> 
> For Arm PMU, the flow below is invoked for every event on every
> affinied CPU in initialization phase:
> 
>   armpmu_event_init() {
>     armv8pmu_set_event_filter();
>   }
> 
> Shouldn't we calculate a superset branch filter for all events, store
> it into a per-CPU data structure and then apply the filter on BRBE?

Should we? No.

*NONE* of the events in your example are CPU-bound, and the call to
armpmu_event_init() can happen on an arbitrary CPU which the relevant
event never actually runs on, while other unrelated events may run on
that CPU.

It makes no sense for armv8pmu_set_event_filter() to write to a per-cpu
structure. That's purely there to determine what the filters *should* be
when *that specific event* is programmed into hardware.

As Rob and I have pointed out already, the *only* thing that can be
relevant to deciding the configuration of HW filtering is the set of
events which are *active* on that CPU.

> > > > There are a number of cases of that shape given the set of configurable
> > > > filters. In theory it's possible to retain those in some cases, but I
> > > > don't think that the complexity is justified.
> > > >
> > > > Similarly, whenever kernel branches are recorded it's necessary to drop
> > > > the stale branches whenever branch recording is paused, as there's
> > > > necessarily a blackout period and hence a gap in the records.
> > >
> > > If we save BRBE record when a process is switched out and then restore
> > > the record when a process is switched in, should we can keep a decent
> > > branch record for performance profiling?
> > 
> > Keep in mind that there's only 64 branches recorded at most. How many
> > branches in a context switch plus reconfiguring the PMU? Not a small
> > percentage of 64 I think. In traces where freeze on overflow was not
> > working (there's an example in v18), just the interrupt entry until
> > BRBE was stopped was a significant part of the trace. A context switch
> > is going to be similar.
> 
> That is true for kernel mode enabled tracing.  But we will have no
> such kind noises for userspace only mode tracing.

As mentioned elsewhere, it's not a problem for x86, so why is it
magically a problem for arm64?

> > > > Do you have a reason why you think we *must* keep events around?
> > >
> > > Here I am really concerned are cases when a process is preempted or
> > > migrated.  The driver doesn't save and restore branch records for these
> > > cases, it just invalidates all records when a task is scheduled in.
> > >
> > > As a result, if an event overflow is close to context switch, it is
> > > likely to capture incomplete branch records.  For a userspace-only
> > > tracing, it is risk to capture empty branch record after preemption
> > > and migrations.
> > 
> > There's the same risk if something else is recording kernel branches
> > when you are recording userspace only. I think the user has to be
> > aware if other things like context switches are perturbing their data.
> 
> I am confused for the decription above.  Does it refer to branch
> recording cross different sessions?  It is fine for me that the branch
> data is interleaved by different sessions (e.g. one is global tracing
> and another is only per-thread tracing).

Imagine that there's an existing process with some pid ${PID}, and
concurrently, the following commands are run, either by the same user or
different users with appropriate permissions:

	# Trying to record user branches only
	perf record -j any,u -e cycles -p ${PID}

	# Trying to record kernel branches only
	perf record -j any,k -e cycles -p ${PID}

Whatever you do, the task trying to record user branches only will lose
some records:

* If we make the events mutually exclusive, the branches will only be
  recorded when the user event is installed.

* If we merge the HW filters and later apply a SW filter, it's very
  likely that kernel branches taken after exception entry have filled
  all the records, and there are no user branches left to sample.

> We might need to consider an intact branch record for the single perf
> session case.  E.g. if userspace program calls:
> 
>     func_a -> func_b -> func_c
> 
> In a case for only userspace tracing, we will have no chance to preserve
> the call sequence of these functions after the program is switched out.

If those functions are small, it's very likely that they'll all be in
the branch history. If they're so large that they're not executed in one
scheduling quantum, do you expect them to fall within the same event
period?

I think that you're making a big deal out of an edge case that doesn't
matter much in practice.

Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-25 17:48               ` Leo Yan
  2025-02-25 19:04                 ` Rob Herring
@ 2025-02-25 19:58                 ` Mark Rutland
  2025-02-26 13:48                   ` Leo Yan
  1 sibling, 1 reply; 33+ messages in thread
From: Mark Rutland @ 2025-02-25 19:58 UTC (permalink / raw)
  To: Leo Yan
  Cc: Rob Herring, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Tue, Feb 25, 2025 at 05:48:03PM +0000, Leo Yan wrote:
> On Tue, Feb 25, 2025 at 12:01:52PM +0000, Mark Rutland wrote:
> 
> [...]
> 
> > > > Critically, the brbe_enable() function merges the filters of all
> > > > *active* events which have been installed into hardware. It does not
> > > > track all events which can be rotated, and the resulting filter is not
> > > > the same -- it can change as a result of rotation.
> > > 
> > > In a perf session has multiple events, and events have different branch
> > > filters, seems to me, a simple way is to return error for this case.
> > 
> > FWIW, I'd generally prefer to do that since it avoids a number of
> > horrible edge-cases and gets rid of the need to do SW filtering, which
> > falls somewhere between "tricky" and "not entirely possible". However,
> > that's not what LBR and others do, which is why we went with filter
> > merging.
> > 
> > If folk on the tools side are happy with the kernel rejecting
> > conflicting events, then I'd be more than happy to do that. What I don't
> > want is that we start off with that approach and people immediately
> > start to complain that the BRBE driver rejects events that the LBR
> > driver accepts.
> > 
> > See the last time this came up.
> 
> Thanks for the shared links.  Based on the info, let's say we can have two
> cases:
> 
>   Case 1: set different branch filters in a single perf session:
> 
>     perf record -e armv8_pmuv3_0/r03,branch_type=any_call/u \
>                 -e armv8_pmuv3_0/r04,branch_type=any_ret/k ...
> 
>   Case 2: set different branch filters in multiple perf sessions:
> 
>     perf record -e armv8_pmuv3_0/r03,branch_type=any_call/u ...
> 
>     perf record -e armv8_pmuv3_0/r04,branch_type=any_ret/k ...
> 
> In my previous reply, I was suggesting that we should reject the case 1.

Do you mean that the kernel should reject that, or that userspace should
reject that.

As mentioned earlier, I am ok with the idea that we reject *scheduling*
events with mismatched filters, as we do for other resource conflicts.
That would necessarily mean rejecting *groups* of events with
inconsistent filters at open time.

However, I do not think that we should reject indepenent events which
happen to have mismatched filters, regardless of whether they're opened
by the same "session".

> IMO, it is not quite useful to configure different filters for events in
> the same session, especially if this leads complexity in the driver due
> to the hardware limitation.

I generally agree, but IIRC userspace does this today.

> For case 2, when create a new session, if the perf tool can read out the
> current branch filter setting (e.g. via sysfs node) and give suggestion
> what branch filter is compabile with existed sessions, seems to me, this
> is a feasible solution.  My understanding this is a rare case, and a
> clear guidance for users would be sufficient if this happens.  (Maybe
> we can give recommendation for how to use BRBE in the perf doc).

No. We are not going to expose *dynamic* information about the PMU
hardware via sysfs. This is not as simple as you make it out to be.
At any point in time there can be an arbitrary number of events open,
and some arbitrary subset currently scheduled.

I agree that ideally this should be rare, though, and hence either of
the two options I have suggested thus far should handle that acceptably.

> To be clear, an important factor is the trace modes with modifier 'u'
> (user) and 'k' (kernel) should be supported for different events and for
> different sessions.  In a mixed cases (some events are userspace only
> and some are kernel only), the BRBE driver needs to filter out branch
> records for specific mode when taking a sample.

I hate to repeat myself, but the driver has *no concept whatsoever* of a
"session". It only knows:

* Which events are currently active in hardware.

* Which events have been grouped together.

* Which events have been opened for a given task or CPU context.

... and *NONE* of those correspond directly to a "session" managed by
the userspace perf tool.

> > > If we can unify branch filter within a perf session, would this be
> > > much easier for handling?
> > 
> > Do you mean if the perf tool ensured that all events in a given session
> > had the same filter? From the kernel's PoV there's no such thing as a
> > "perf session", and I'm not sure whether you're suggesting doing that in
> > userspace or withing the kernel.
> 
> My understanding is this would be not difficult to do such kind checking
> in the tool.  E.g., the perf tool can iterate every event and check the
> branch filter and detect incompabile issue.

Cool, sounds like we could do that then!

> > Doing that in the perf tool would certianly make a stronger argument for
> > the kernel taking the "reject conflicting branch filters" option.
> > 
> > Doing that within the kernel isn't really possible.
> 
> As said above, if the BRBE driver can provide a knob in sysfs to indicate
> what is the current branch filter in the existed sessions, this would be
> helpful for the tool to do the checking and remind users.

Sorry, as above, that is not going to happen. It is not practical and
will cause many other problems.

> I haven't done any experiments for this. If you think this is the way
> to move forward, I might do a prototype and get back to you to ensure we
> don't run into any unexpected issues.

What specifically are you proposing to prototype?

Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-25 19:58                 ` Mark Rutland
@ 2025-02-26 13:48                   ` Leo Yan
  2025-02-26 14:26                     ` Rob Herring
  0 siblings, 1 reply; 33+ messages in thread
From: Leo Yan @ 2025-02-26 13:48 UTC (permalink / raw)
  To: Mark Rutland
  Cc: Rob Herring, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Tue, Feb 25, 2025 at 07:58:18PM +0000, Mark Rutland wrote:
> On Tue, Feb 25, 2025 at 05:48:03PM +0000, Leo Yan wrote:
> > On Tue, Feb 25, 2025 at 12:01:52PM +0000, Mark Rutland wrote:
> > 
> > [...]
> > 
> > > > > Critically, the brbe_enable() function merges the filters of all
> > > > > *active* events which have been installed into hardware. It does not
> > > > > track all events which can be rotated, and the resulting filter is not
> > > > > the same -- it can change as a result of rotation.
> > > > 
> > > > In a perf session has multiple events, and events have different branch
> > > > filters, seems to me, a simple way is to return error for this case.
> > > 
> > > FWIW, I'd generally prefer to do that since it avoids a number of
> > > horrible edge-cases and gets rid of the need to do SW filtering, which
> > > falls somewhere between "tricky" and "not entirely possible". However,
> > > that's not what LBR and others do, which is why we went with filter
> > > merging.
> > > 
> > > If folk on the tools side are happy with the kernel rejecting
> > > conflicting events, then I'd be more than happy to do that. What I don't
> > > want is that we start off with that approach and people immediately
> > > start to complain that the BRBE driver rejects events that the LBR
> > > driver accepts.
> > > 
> > > See the last time this came up.
> > 
> > Thanks for the shared links.  Based on the info, let's say we can have two
> > cases:
> > 
> >   Case 1: set different branch filters in a single perf session:
> > 
> >     perf record -e armv8_pmuv3_0/r03,branch_type=any_call/u \
> >                 -e armv8_pmuv3_0/r04,branch_type=any_ret/k ...
> > 
> >   Case 2: set different branch filters in multiple perf sessions:
> > 
> >     perf record -e armv8_pmuv3_0/r03,branch_type=any_call/u ...
> > 
> >     perf record -e armv8_pmuv3_0/r04,branch_type=any_ret/k ...
> > 
> > In my previous reply, I was suggesting that we should reject the case 1.
> 
> Do you mean that the kernel should reject that, or that userspace should
> reject that.

For the case 1, the userspace can reject it as it has sufficient info to
detect mismatched filters by iterating event list.

> As mentioned earlier, I am ok with the idea that we reject *scheduling*
> events with mismatched filters, as we do for other resource conflicts.
> That would necessarily mean rejecting *groups* of events with
> inconsistent filters at open time.

This rejects the case 2 in the PMU driver.

> However, I do not think that we should reject indepenent events which
> happen to have mismatched filters, regardless of whether they're opened
> by the same "session".

Can I understand "independent events" are events that are not in a event
group?

If independent events can have mismatched filters, the PMU driver can
activate them simultaneously, this means the BRBE driver needs to merge
the branch filters.  If so, I still see the complexity caused by this
case.

> > IMO, it is not quite useful to configure different filters for events in
> > the same session, especially if this leads complexity in the driver due
> > to the hardware limitation.
> 
> I generally agree, but IIRC userspace does this today.
> 
> > For case 2, when create a new session, if the perf tool can read out the
> > current branch filter setting (e.g. via sysfs node) and give suggestion
> > what branch filter is compabile with existed sessions, seems to me, this
> > is a feasible solution.  My understanding this is a rare case, and a
> > clear guidance for users would be sufficient if this happens.  (Maybe
> > we can give recommendation for how to use BRBE in the perf doc).
> 
> No. We are not going to expose *dynamic* information about the PMU
> hardware via sysfs. This is not as simple as you make it out to be.
> At any point in time there can be an arbitrary number of events open,
> and some arbitrary subset currently scheduled.

If so, the perf tool will miss info for checking the existed branch
filter and defer to handle the mismatched filter error until the
kernel reports the issue.

> I agree that ideally this should be rare, though, and hence either of
> the two options I have suggested thus far should handle that acceptably.
> 
> > To be clear, an important factor is the trace modes with modifier 'u'
> > (user) and 'k' (kernel) should be supported for different events and for
> > different sessions.  In a mixed cases (some events are userspace only
> > and some are kernel only), the BRBE driver needs to filter out branch
> > records for specific mode when taking a sample.
> 
> I hate to repeat myself, but the driver has *no concept whatsoever* of a
> "session". It only knows:
> 
> * Which events are currently active in hardware.
> 
> * Which events have been grouped together.
> 
> * Which events have been opened for a given task or CPU context.
> 
> ... and *NONE* of those correspond directly to a "session" managed by
> the userspace perf tool.

Sorry for that.  After your reminder, I understood that the PMU driver
has no knowledge for perf session.

The description above does not refer to perf session.  As you said, the
PMU driver has its own context to track active events, and it is
possible that some active events might trace only user mode while others
trace only kernel mode.  In this case, BRBE needs to be enabled for both
user and kernel modes.  When capturing samples, the BRBE driver needs to
filter out branch records based on a specific event for its
corresponding mode.

For better understanding, though we don't allow mismatched filters, I
still think we should set both E0BRE and ExBRE bits if there are
multiple active events with different modes.

[...]

> > I haven't done any experiments for this. If you think this is the way
> > to move forward, I might do a prototype and get back to you to ensure we
> > don't run into any unexpected issues.
> 
> What specifically are you proposing to prototype?

I mean, in Linux perf, we can add code to check branch filters for
opened events when creating a session and report an error for
mismatching filters.  I would suggest we implement it specifically
for Arm64 to avoid dispute with other archs.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
  2025-02-26 13:48                   ` Leo Yan
@ 2025-02-26 14:26                     ` Rob Herring
  0 siblings, 0 replies; 33+ messages in thread
From: Rob Herring @ 2025-02-26 14:26 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Will Deacon, Catalin Marinas, Jonathan Corbet,
	Marc Zyngier, Oliver Upton, Joey Gouly, Suzuki K Poulose,
	Zenghui Yu, James Clark, Anshuman Khandual, linux-arm-kernel,
	linux-perf-users, linux-kernel, linux-doc, kvmarm

On Wed, Feb 26, 2025 at 7:48 AM Leo Yan <leo.yan@arm.com> wrote:
>
> On Tue, Feb 25, 2025 at 07:58:18PM +0000, Mark Rutland wrote:
> > On Tue, Feb 25, 2025 at 05:48:03PM +0000, Leo Yan wrote:
> > > On Tue, Feb 25, 2025 at 12:01:52PM +0000, Mark Rutland wrote:
> > >
> > > [...]
> > >
> > > > > > Critically, the brbe_enable() function merges the filters of all
> > > > > > *active* events which have been installed into hardware. It does not
> > > > > > track all events which can be rotated, and the resulting filter is not
> > > > > > the same -- it can change as a result of rotation.
> > > > >
> > > > > In a perf session has multiple events, and events have different branch
> > > > > filters, seems to me, a simple way is to return error for this case.
> > > >
> > > > FWIW, I'd generally prefer to do that since it avoids a number of
> > > > horrible edge-cases and gets rid of the need to do SW filtering, which
> > > > falls somewhere between "tricky" and "not entirely possible". However,
> > > > that's not what LBR and others do, which is why we went with filter
> > > > merging.
> > > >
> > > > If folk on the tools side are happy with the kernel rejecting
> > > > conflicting events, then I'd be more than happy to do that. What I don't
> > > > want is that we start off with that approach and people immediately
> > > > start to complain that the BRBE driver rejects events that the LBR
> > > > driver accepts.
> > > >
> > > > See the last time this came up.
> > >
> > > Thanks for the shared links.  Based on the info, let's say we can have two
> > > cases:
> > >
> > >   Case 1: set different branch filters in a single perf session:
> > >
> > >     perf record -e armv8_pmuv3_0/r03,branch_type=any_call/u \
> > >                 -e armv8_pmuv3_0/r04,branch_type=any_ret/k ...
> > >
> > >   Case 2: set different branch filters in multiple perf sessions:
> > >
> > >     perf record -e armv8_pmuv3_0/r03,branch_type=any_call/u ...
> > >
> > >     perf record -e armv8_pmuv3_0/r04,branch_type=any_ret/k ...
> > >
> > > In my previous reply, I was suggesting that we should reject the case 1.
> >
> > Do you mean that the kernel should reject that, or that userspace should
> > reject that.
>
> For the case 1, the userspace can reject it as it has sufficient info to
> detect mismatched filters by iterating event list.
>
> > As mentioned earlier, I am ok with the idea that we reject *scheduling*
> > events with mismatched filters, as we do for other resource conflicts.
> > That would necessarily mean rejecting *groups* of events with
> > inconsistent filters at open time.
>
> This rejects the case 2 in the PMU driver.
>
> > However, I do not think that we should reject indepenent events which
> > happen to have mismatched filters, regardless of whether they're opened
> > by the same "session".
>
> Can I understand "independent events" are events that are not in a event
> group?

Yes. The BRBE code has no concept of groups either.

> If independent events can have mismatched filters, the PMU driver can
> activate them simultaneously, this means the BRBE driver needs to merge
> the branch filters.  If so, I still see the complexity caused by this
> case.

That's exactly what the driver does. It's not even that complex. We
have the BRBFCR and BRBCR register values for each event and just have
to OR them together for the enabled events. For filtering we just
compare each branch record to a mask of the enabled branch types for
the event.

> > > IMO, it is not quite useful to configure different filters for events in
> > > the same session, especially if this leads complexity in the driver due
> > > to the hardware limitation.
> >
> > I generally agree, but IIRC userspace does this today.
> >
> > > For case 2, when create a new session, if the perf tool can read out the
> > > current branch filter setting (e.g. via sysfs node) and give suggestion
> > > what branch filter is compabile with existed sessions, seems to me, this
> > > is a feasible solution.  My understanding this is a rare case, and a
> > > clear guidance for users would be sufficient if this happens.  (Maybe
> > > we can give recommendation for how to use BRBE in the perf doc).
> >
> > No. We are not going to expose *dynamic* information about the PMU
> > hardware via sysfs. This is not as simple as you make it out to be.
> > At any point in time there can be an arbitrary number of events open,
> > and some arbitrary subset currently scheduled.
>
> If so, the perf tool will miss info for checking the existed branch
> filter and defer to handle the mismatched filter error until the
> kernel reports the issue.
>
> > I agree that ideally this should be rare, though, and hence either of
> > the two options I have suggested thus far should handle that acceptably.
> >
> > > To be clear, an important factor is the trace modes with modifier 'u'
> > > (user) and 'k' (kernel) should be supported for different events and for
> > > different sessions.  In a mixed cases (some events are userspace only
> > > and some are kernel only), the BRBE driver needs to filter out branch
> > > records for specific mode when taking a sample.
> >
> > I hate to repeat myself, but the driver has *no concept whatsoever* of a
> > "session". It only knows:
> >
> > * Which events are currently active in hardware.
> >
> > * Which events have been grouped together.
> >
> > * Which events have been opened for a given task or CPU context.
> >
> > ... and *NONE* of those correspond directly to a "session" managed by
> > the userspace perf tool.
>
> Sorry for that.  After your reminder, I understood that the PMU driver
> has no knowledge for perf session.
>
> The description above does not refer to perf session.  As you said, the
> PMU driver has its own context to track active events, and it is
> possible that some active events might trace only user mode while others
> trace only kernel mode.  In this case, BRBE needs to be enabled for both
> user and kernel modes.  When capturing samples, the BRBE driver needs to
> filter out branch records based on a specific event for its
> corresponding mode.

That's exactly what the driver does.

> For better understanding, though we don't allow mismatched filters, I
> still think we should set both E0BRE and ExBRE bits if there are
> multiple active events with different modes.

That's exactly what the driver does.

> [...]
>
> > > I haven't done any experiments for this. If you think this is the way
> > > to move forward, I might do a prototype and get back to you to ensure we
> > > don't run into any unexpected issues.
> >
> > What specifically are you proposing to prototype?
>
> I mean, in Linux perf, we can add code to check branch filters for
> opened events when creating a session and report an error for
> mismatching filters.  I would suggest we implement it specifically
> for Arm64 to avoid dispute with other archs.

There's not really any such thing as mismatching filters. There's no
mutually exclusive filters. The only effect having multiple events
enabling branch stack is you might run out of branch records sooner. A
CPU with a minimal number of branch records is going to have a similar
problem as will just enabling all branches on a single event. There's
not any way to fix that other than increasing the number of records or
adding a mechanism to the architecture to avoid dropping records.

Rob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 00/11] arm64/perf: Enable branch stack sampling
  2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
                   ` (11 preceding siblings ...)
  2025-02-19 16:09 ` [PATCH v20 00/11] arm64/perf: Enable branch stack sampling James Clark
@ 2025-03-01  7:05 ` Will Deacon
  2025-03-03 16:44   ` Rob Herring
  12 siblings, 1 reply; 33+ messages in thread
From: Will Deacon @ 2025-03-01  7:05 UTC (permalink / raw)
  To: Mark Rutland, Catalin Marinas, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	James Clark, Anshuman Khandual, Leo Yan, Rob Herring (Arm)
  Cc: kernel-team, Will Deacon, linux-arm-kernel, linux-perf-users,
	linux-kernel, linux-doc, kvmarm, Mark Brown

On Tue, 18 Feb 2025 14:39:55 -0600, Rob Herring (Arm) wrote:
> This series enables perf branch stack sampling support on arm64 via a
> v9.2 arch feature called Branch Record Buffer Extension (BRBE). Details
> on BRBE can be found in the Arm ARM[1] chapter D18.
> 
> I've picked up this series from Anshuman. v19 and v20 versions have been
> reworked quite a bit by Mark and myself. The bulk of those changes are
> in patch 11.
> 
> [...]

Applied cleanups to will (for-next/perf), thanks!

[01/11] perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters
        https://git.kernel.org/will/c/04bd15c4cbc3
[02/11] perf: arm_pmu: Don't disable counter in armpmu_add()
        https://git.kernel.org/will/c/dcca27bc1ecc
[03/11] perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event()
        https://git.kernel.org/will/c/4b0567ad0be5
[04/11] perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts
        https://git.kernel.org/will/c/7a5387748215
[05/11] perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event()
        https://git.kernel.org/will/c/7bf1001e0d91
[06/11] perf: apple_m1: Don't disable counter in m1_pmu_enable_event()
        https://git.kernel.org/will/c/c2e793da59fc
[07/11] perf: arm_pmu: Move PMUv3-specific data
        https://git.kernel.org/will/c/dc4d58a752ea

Cheers,
-- 
Will

https://fixes.arm64.dev
https://next.arm64.dev
https://will.arm64.dev

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 00/11] arm64/perf: Enable branch stack sampling
  2025-03-01  7:05 ` Will Deacon
@ 2025-03-03 16:44   ` Rob Herring
  2025-03-04 11:25     ` Catalin Marinas
  0 siblings, 1 reply; 33+ messages in thread
From: Rob Herring @ 2025-03-03 16:44 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Catalin Marinas, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	James Clark, Anshuman Khandual, Leo Yan, kernel-team,
	linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm, Mark Brown

On Sat, Mar 1, 2025 at 1:05 AM Will Deacon <will@kernel.org> wrote:
>
> On Tue, 18 Feb 2025 14:39:55 -0600, Rob Herring (Arm) wrote:
> > This series enables perf branch stack sampling support on arm64 via a
> > v9.2 arch feature called Branch Record Buffer Extension (BRBE). Details
> > on BRBE can be found in the Arm ARM[1] chapter D18.
> >
> > I've picked up this series from Anshuman. v19 and v20 versions have been
> > reworked quite a bit by Mark and myself. The bulk of those changes are
> > in patch 11.
> >
> > [...]
>
> Applied cleanups to will (for-next/perf), thanks!
>
> [01/11] perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters
>         https://git.kernel.org/will/c/04bd15c4cbc3
> [02/11] perf: arm_pmu: Don't disable counter in armpmu_add()
>         https://git.kernel.org/will/c/dcca27bc1ecc
> [03/11] perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event()
>         https://git.kernel.org/will/c/4b0567ad0be5
> [04/11] perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts
>         https://git.kernel.org/will/c/7a5387748215
> [05/11] perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event()
>         https://git.kernel.org/will/c/7bf1001e0d91
> [06/11] perf: apple_m1: Don't disable counter in m1_pmu_enable_event()
>         https://git.kernel.org/will/c/c2e793da59fc
> [07/11] perf: arm_pmu: Move PMUv3-specific data
>         https://git.kernel.org/will/c/dc4d58a752ea

I don't know if you looked at the thread on patch 11 and said "long
discussion, I'll assume a new version is coming. Next!" because that's
what I would do. In this case though, there's not any changes.

Rob

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 00/11] arm64/perf: Enable branch stack sampling
  2025-03-03 16:44   ` Rob Herring
@ 2025-03-04 11:25     ` Catalin Marinas
  2025-03-04 16:25       ` Mark Rutland
  0 siblings, 1 reply; 33+ messages in thread
From: Catalin Marinas @ 2025-03-04 11:25 UTC (permalink / raw)
  To: Rob Herring
  Cc: Will Deacon, Mark Rutland, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	James Clark, Anshuman Khandual, Leo Yan, kernel-team,
	linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm, Mark Brown

On Mon, Mar 03, 2025 at 10:44:21AM -0600, Rob Herring wrote:
> On Sat, Mar 1, 2025 at 1:05 AM Will Deacon <will@kernel.org> wrote:
> > On Tue, 18 Feb 2025 14:39:55 -0600, Rob Herring (Arm) wrote:
> > > This series enables perf branch stack sampling support on arm64 via a
> > > v9.2 arch feature called Branch Record Buffer Extension (BRBE). Details
> > > on BRBE can be found in the Arm ARM[1] chapter D18.
> > >
> > > I've picked up this series from Anshuman. v19 and v20 versions have been
> > > reworked quite a bit by Mark and myself. The bulk of those changes are
> > > in patch 11.
> > >
> > > [...]
> >
> > Applied cleanups to will (for-next/perf), thanks!
> >
> > [01/11] perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters
> >         https://git.kernel.org/will/c/04bd15c4cbc3
> > [02/11] perf: arm_pmu: Don't disable counter in armpmu_add()
> >         https://git.kernel.org/will/c/dcca27bc1ecc
> > [03/11] perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event()
> >         https://git.kernel.org/will/c/4b0567ad0be5
> > [04/11] perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts
> >         https://git.kernel.org/will/c/7a5387748215
> > [05/11] perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event()
> >         https://git.kernel.org/will/c/7bf1001e0d91
> > [06/11] perf: apple_m1: Don't disable counter in m1_pmu_enable_event()
> >         https://git.kernel.org/will/c/c2e793da59fc
> > [07/11] perf: arm_pmu: Move PMUv3-specific data
> >         https://git.kernel.org/will/c/dc4d58a752ea
> 
> I don't know if you looked at the thread on patch 11 and said "long
> discussion, I'll assume a new version is coming. Next!" because that's
> what I would do. In this case though, there's not any changes.

I do this as well ;). But I think Will is waiting for Mark R to look at
the rest of the series.

-- 
Catalin

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH v20 00/11] arm64/perf: Enable branch stack sampling
  2025-03-04 11:25     ` Catalin Marinas
@ 2025-03-04 16:25       ` Mark Rutland
  0 siblings, 0 replies; 33+ messages in thread
From: Mark Rutland @ 2025-03-04 16:25 UTC (permalink / raw)
  To: Catalin Marinas
  Cc: Rob Herring, Will Deacon, Jonathan Corbet, Marc Zyngier,
	Oliver Upton, Joey Gouly, Suzuki K Poulose, Zenghui Yu,
	James Clark, Anshuman Khandual, Leo Yan, kernel-team,
	linux-arm-kernel, linux-perf-users, linux-kernel, linux-doc,
	kvmarm, Mark Brown

On Tue, Mar 04, 2025 at 11:25:06AM +0000, Catalin Marinas wrote:
> On Mon, Mar 03, 2025 at 10:44:21AM -0600, Rob Herring wrote:
> > On Sat, Mar 1, 2025 at 1:05 AM Will Deacon <will@kernel.org> wrote:
> > > On Tue, 18 Feb 2025 14:39:55 -0600, Rob Herring (Arm) wrote:
> > > > This series enables perf branch stack sampling support on arm64 via a
> > > > v9.2 arch feature called Branch Record Buffer Extension (BRBE). Details
> > > > on BRBE can be found in the Arm ARM[1] chapter D18.
> > > >
> > > > I've picked up this series from Anshuman. v19 and v20 versions have been
> > > > reworked quite a bit by Mark and myself. The bulk of those changes are
> > > > in patch 11.
> > > >
> > > > [...]
> > >
> > > Applied cleanups to will (for-next/perf), thanks!
> > >
> > > [01/11] perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters
> > >         https://git.kernel.org/will/c/04bd15c4cbc3
> > > [02/11] perf: arm_pmu: Don't disable counter in armpmu_add()
> > >         https://git.kernel.org/will/c/dcca27bc1ecc
> > > [03/11] perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event()
> > >         https://git.kernel.org/will/c/4b0567ad0be5
> > > [04/11] perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts
> > >         https://git.kernel.org/will/c/7a5387748215
> > > [05/11] perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event()
> > >         https://git.kernel.org/will/c/7bf1001e0d91
> > > [06/11] perf: apple_m1: Don't disable counter in m1_pmu_enable_event()
> > >         https://git.kernel.org/will/c/c2e793da59fc
> > > [07/11] perf: arm_pmu: Move PMUv3-specific data
> > >         https://git.kernel.org/will/c/dc4d58a752ea
> > 
> > I don't know if you looked at the thread on patch 11 and said "long
> > discussion, I'll assume a new version is coming. Next!" because that's
> > what I would do. In this case though, there's not any changes.
> 
> I do this as well ;). But I think Will is waiting for Mark R to look at
> the rest of the series.

Sorry for the delay there.

I'm happy with the structure of this; my only remaining concern is the
filtering logic, as it is surprisingly subtle, and when I was last
working on that it wasn't clear to me whether we could alwways filter
appropriately (or whether it'd be better to reject scheduling of events
with mismatched filters).

I'll try to page that in and properly attack that soon, but aside from
that concern the series as a whole looks good to me.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2025-03-04 16:25 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-18 20:39 [PATCH v20 00/11] arm64/perf: Enable branch stack sampling Rob Herring (Arm)
2025-02-18 20:39 ` [PATCH v20 01/11] perf: arm_pmuv3: Call kvm_vcpu_pmu_resync_el0() before enabling counters Rob Herring (Arm)
2025-02-18 20:39 ` [PATCH v20 02/11] perf: arm_pmu: Don't disable counter in armpmu_add() Rob Herring (Arm)
2025-02-18 20:39 ` [PATCH v20 03/11] perf: arm_pmuv3: Don't disable counter in armv8pmu_enable_event() Rob Herring (Arm)
2025-02-18 20:39 ` [PATCH v20 04/11] perf: arm_v7_pmu: Drop obvious comments for enabling/disabling counters and interrupts Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 05/11] perf: arm_v7_pmu: Don't disable counter in (armv7|krait_|scorpion_)pmu_enable_event() Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 06/11] perf: apple_m1: Don't disable counter in m1_pmu_enable_event() Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 07/11] perf: arm_pmu: Move PMUv3-specific data Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 08/11] arm64/sysreg: Add BRBE registers and fields Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 09/11] arm64: Handle BRBE booting requirements Rob Herring (Arm)
2025-02-18 20:40 ` [PATCH v20 10/11] KVM: arm64: nvhe: Disable branch generation in nVHE guests Rob Herring (Arm)
2025-02-24 10:41   ` Leo Yan
2025-02-18 20:40 ` [PATCH v20 11/11] perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE) Rob Herring (Arm)
2025-02-24 12:25   ` Leo Yan
2025-02-24 12:46     ` Rob Herring
2025-02-24 14:03       ` Leo Yan
2025-02-24 16:05         ` Mark Rutland
2025-02-24 18:03           ` Leo Yan
2025-02-25  1:31             ` Rob Herring
2025-02-25 12:38               ` Leo Yan
2025-02-25 15:35                 ` Rob Herring
2025-02-25 19:46                 ` Mark Rutland
2025-02-25 12:01             ` Mark Rutland
2025-02-25 17:48               ` Leo Yan
2025-02-25 19:04                 ` Rob Herring
2025-02-25 19:58                 ` Mark Rutland
2025-02-26 13:48                   ` Leo Yan
2025-02-26 14:26                     ` Rob Herring
2025-02-19 16:09 ` [PATCH v20 00/11] arm64/perf: Enable branch stack sampling James Clark
2025-03-01  7:05 ` Will Deacon
2025-03-03 16:44   ` Rob Herring
2025-03-04 11:25     ` Catalin Marinas
2025-03-04 16:25       ` Mark Rutland

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).