[RESEND PATCH V3 1/6] perf: Add branch stack extra

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [RESEND PATCH V3 1/6] perf: Add branch stack extra
@ 2023-09-11 15:48 kan.liang
  2023-09-11 15:48 ` [RESEND PATCH V3 2/6] perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag kan.liang
                   ` (6 more replies)
  0 siblings, 7 replies; 17+ messages in thread
From: kan.liang @ 2023-09-11 15:48 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, namhyung, irogers,
	adrian.hunter, ak, eranian, alexey.v.bayduraev, tinghao.zhang,
	Kan Liang, Sandipan Das, Ravi Bangoria, Athira Rajeev

From: Kan Liang <kan.liang@linux.intel.com>

Currently, the additional information of a branch entry is stored in a
u64 space. With more and more information added, the space is running
out. For example, the information of occurrences of events will be added
for each branch.

Add a new branch sample type, PERF_SAMPLE_BRANCH_EXTRA, to indicate
whether to support an extra space.

Two places were suggested to append the extra space.
https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/
One place is right after the flags of each branch entry. It changes the
existing struct perf_branch_entry. In the later Intel-specific
implementation, two separate spaces have to be created in the
struct cpu_hw_events to store different branch entry structures. That
duplicates space.
The other place is right after the entire struct perf_branch_stack.
Only adding the new extra space in the struct cpu_hw_event is necessary.
The disadvantage is that the pointer of the extra space has to be
recorded. The common interface perf_sample_save_brstack() has to be
updated as well.

The latter requires less space and is much straight forward. It is
implemented in the patch.

Also, add a new branch sample type, PERF_SAMPLE_BRANCH_EVT_CNTRS, to
indicate whether include occurrences of events in branch info. The
information will be stored in the extra space.

Add two helper functions for the new branch sample types.

Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
---

Changes since V2:
- Drop the new bit in struct perf_branch_entry
- Introduce a new sample type PERF_SAMPLE_BRANCH_EXTRA

 arch/powerpc/perf/core-book3s.c |  2 +-
 arch/x86/events/amd/core.c      |  2 +-
 arch/x86/events/core.c          |  2 +-
 arch/x86/events/intel/core.c    |  2 +-
 arch/x86/events/intel/ds.c      |  4 ++--
 include/linux/perf_event.h      | 22 +++++++++++++++++++++-
 include/uapi/linux/perf_event.h |  9 +++++++++
 kernel/events/core.c            |  8 ++++++++
 8 files changed, 44 insertions(+), 7 deletions(-)

diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 8c1f7def596e..3c14596bbfaf 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -2313,7 +2313,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
 			struct cpu_hw_events *cpuhw;
 			cpuhw = this_cpu_ptr(&cpu_hw_events);
 			power_pmu_bhrb_read(event, cpuhw);
-			perf_sample_save_brstack(&data, event, &cpuhw->bhrb_stack);
+			perf_sample_save_brstack(&data, event, &cpuhw->bhrb_stack, NULL);
 		}
 
 		if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
index abadd5f23425..01c0619d12d5 100644
--- a/arch/x86/events/amd/core.c
+++ b/arch/x86/events/amd/core.c
@@ -930,7 +930,7 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
 			continue;
 
 		if (has_branch_stack(event))
-			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
+			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
 
 		if (perf_event_overflow(event, &data, regs))
 			x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 185f902e5f28..2919bb5a53a0 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1702,7 +1702,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
 		perf_sample_data_init(&data, 0, event->hw.last_period);
 
 		if (has_branch_stack(event))
-			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
+			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
 
 		if (perf_event_overflow(event, &data, regs))
 			x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 64a3533997e1..0f4ce79de4f8 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3058,7 +3058,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 		perf_sample_data_init(&data, 0, event->hw.last_period);
 
 		if (has_branch_stack(event))
-			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
+			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
 
 		if (perf_event_overflow(event, &data, regs))
 			x86_pmu_stop(event, 0);
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index eb8dd8b8a1e8..7566190389f0 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1755,7 +1755,7 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
 		setup_pebs_time(event, data, pebs->tsc);
 
 	if (has_branch_stack(event))
-		perf_sample_save_brstack(data, event, &cpuc->lbr_stack);
+		perf_sample_save_brstack(data, event, &cpuc->lbr_stack, NULL);
 }
 
 static void adaptive_pebs_save_regs(struct pt_regs *regs,
@@ -1912,7 +1912,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
 
 		if (has_branch_stack(event)) {
 			intel_pmu_store_pebs_lbrs(lbr);
-			perf_sample_save_brstack(data, event, &cpuc->lbr_stack);
+			perf_sample_save_brstack(data, event, &cpuc->lbr_stack, NULL);
 		}
 	}
 
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e83f13ce4a9f..c4877924d43c 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1137,6 +1137,15 @@ static inline bool branch_sample_priv(const struct perf_event *event)
 	return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_PRIV_SAVE;
 }
 
+static inline bool branch_sample_extra(const struct perf_event *event)
+{
+	return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_EXTRA;
+}
+
+static inline bool branch_sample_evt_cntrs(const struct perf_event *event)
+{
+	return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_EVT_CNTRS;
+}
 
 struct perf_sample_data {
 	/*
@@ -1171,6 +1180,7 @@ struct perf_sample_data {
 	struct perf_callchain_entry	*callchain;
 	struct perf_raw_record		*raw;
 	struct perf_branch_stack	*br_stack;
+	u64				*br_stack_ext;
 	union perf_sample_weight	weight;
 	union  perf_mem_data_src	data_src;
 	u64				txn;
@@ -1248,7 +1258,8 @@ static inline void perf_sample_save_raw_data(struct perf_sample_data *data,
 
 static inline void perf_sample_save_brstack(struct perf_sample_data *data,
 					    struct perf_event *event,
-					    struct perf_branch_stack *brs)
+					    struct perf_branch_stack *brs,
+					    u64 *brs_ext)
 {
 	int size = sizeof(u64); /* nr */
 
@@ -1256,7 +1267,16 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
 		size += sizeof(u64);
 	size += brs->nr * sizeof(struct perf_branch_entry);
 
+	/*
+	 * The extension space is appended after the struct perf_branch_stack.
+	 * It is used to store the extra data of each branch, e.g.,
+	 * the occurrences of events since the last branch entry for Intel LBR.
+	 */
+	if (branch_sample_extra(event))
+		size += brs->nr * sizeof(u64);
+
 	data->br_stack = brs;
+	data->br_stack_ext = brs_ext;
 	data->dyn_size += size;
 	data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
 }
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 39c6a250dd1b..252066579dae 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -204,6 +204,10 @@ enum perf_branch_sample_type_shift {
 
 	PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT	= 18, /* save privilege mode */
 
+	PERF_SAMPLE_BRANCH_EXTRA_SHIFT		= 19, /* support extra space */
+
+	PERF_SAMPLE_BRANCH_EVT_CNTRS_SHIFT	= 20, /* save occurrences of events on a branch */
+
 	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
 };
 
@@ -235,6 +239,10 @@ enum perf_branch_sample_type {
 
 	PERF_SAMPLE_BRANCH_PRIV_SAVE	= 1U << PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT,
 
+	PERF_SAMPLE_BRANCH_EXTRA	= 1U << PERF_SAMPLE_BRANCH_EXTRA_SHIFT,
+
+	PERF_SAMPLE_BRANCH_EVT_CNTRS	= 1U << PERF_SAMPLE_BRANCH_EVT_CNTRS_SHIFT,
+
 	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
 
@@ -982,6 +990,7 @@ enum perf_event_type {
 	 *	{ u64                   nr;
 	 *	  { u64	hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
 	 *        { u64 from, to, flags } lbr[nr];
+	 *        { u64 extra; } ext[nr] && PERF_SAMPLE_BRANCH_EXTRA
 	 *      } && PERF_SAMPLE_BRANCH_STACK
 	 *
 	 * 	{ u64			abi; # enum perf_sample_regs_abi
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f84e2640ea2f..482e7efe365c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7324,6 +7324,14 @@ void perf_output_sample(struct perf_output_handle *handle,
 			if (branch_sample_hw_index(event))
 				perf_output_put(handle, data->br_stack->hw_idx);
 			perf_output_copy(handle, data->br_stack->entries, size);
+			/*
+			 * Add the extension space which is appended
+			 * right after the struct perf_branch_stack.
+			 */
+			if (branch_sample_extra(event) && data->br_stack_ext) {
+				size = data->br_stack->nr * sizeof(u64);
+				perf_output_copy(handle, data->br_stack_ext, size);
+			}
 		} else {
 			/*
 			 * we always store at least the value of nr
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RESEND PATCH V3 2/6] perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag
  2023-09-11 15:48 [RESEND PATCH V3 1/6] perf: Add branch stack extra kan.liang
@ 2023-09-11 15:48 ` kan.liang
  2023-09-11 15:48 ` [RESEND PATCH V3 3/6] perf: Add branch_sample_call_stack kan.liang
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: kan.liang @ 2023-09-11 15:48 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, namhyung, irogers,
	adrian.hunter, ak, eranian, alexey.v.bayduraev, tinghao.zhang,
	Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Currently, branch_sample_type !=0 is used to check whether a branch
stack setup is required. But it doesn't check the sample type,
unnecessary branch stack setup may be done for a counting event. E.g.,
perf record -e "{branch-instructions,branch-misses}:S" -j any
Also, the event with the new PERF_SAMPLE_BRANCH_EVT_CNTRS branch sample
type may not require a branch stack setup either.

Add a new flag NEEDS_BRANCH_STACK to indicate whether the event requires
a branch stack setup. Replace the needs_branch_stack() by checking the
new flag.

The counting event check is implemented here. The later patch will take
the new PERF_SAMPLE_BRANCH_EVT_CNTRS into account.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

New patch

 arch/x86/events/intel/core.c       | 14 +++++++++++---
 arch/x86/events/perf_event_flags.h |  1 +
 2 files changed, 12 insertions(+), 3 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 0f4ce79de4f8..8434315b765f 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2518,9 +2518,14 @@ static void intel_pmu_assign_event(struct perf_event *event, int idx)
 		perf_report_aux_output_id(event, idx);
 }
 
+static __always_inline bool intel_pmu_needs_branch_stack(struct perf_event *event)
+{
+	return event->hw.flags & PERF_X86_EVENT_NEEDS_BRANCH_STACK;
+}
+
 static void intel_pmu_del_event(struct perf_event *event)
 {
-	if (needs_branch_stack(event))
+	if (intel_pmu_needs_branch_stack(event))
 		intel_pmu_lbr_del(event);
 	if (event->attr.precise_ip)
 		intel_pmu_pebs_del(event);
@@ -2831,7 +2836,7 @@ static void intel_pmu_add_event(struct perf_event *event)
 {
 	if (event->attr.precise_ip)
 		intel_pmu_pebs_add(event);
-	if (needs_branch_stack(event))
+	if (intel_pmu_needs_branch_stack(event))
 		intel_pmu_lbr_add(event);
 }
 
@@ -3908,7 +3913,10 @@ static int intel_pmu_hw_config(struct perf_event *event)
 			x86_pmu.pebs_aliases(event);
 	}
 
-	if (needs_branch_stack(event)) {
+	if (needs_branch_stack(event) && is_sampling_event(event))
+		event->hw.flags  |= PERF_X86_EVENT_NEEDS_BRANCH_STACK;
+
+	if (intel_pmu_needs_branch_stack(event)) {
 		ret = intel_pmu_setup_lbr_filter(event);
 		if (ret)
 			return ret;
diff --git a/arch/x86/events/perf_event_flags.h b/arch/x86/events/perf_event_flags.h
index 1dc19b9b4426..a1685981c520 100644
--- a/arch/x86/events/perf_event_flags.h
+++ b/arch/x86/events/perf_event_flags.h
@@ -20,3 +20,4 @@ PERF_ARCH(TOPDOWN,		0x04000) /* Count Topdown slots/metrics events */
 PERF_ARCH(PEBS_STLAT,		0x08000) /* st+stlat data address sampling */
 PERF_ARCH(AMD_BRS,		0x10000) /* AMD Branch Sampling */
 PERF_ARCH(PEBS_LAT_HYBRID,	0x20000) /* ld and st lat for hybrid */
+PERF_ARCH(NEEDS_BRANCH_STACK,	0x40000) /* require branch stack setup */
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RESEND PATCH V3 3/6] perf: Add branch_sample_call_stack
  2023-09-11 15:48 [RESEND PATCH V3 1/6] perf: Add branch stack extra kan.liang
  2023-09-11 15:48 ` [RESEND PATCH V3 2/6] perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag kan.liang
@ 2023-09-11 15:48 ` kan.liang
  2023-09-11 15:48 ` [RESEND PATCH V3 4/6] perf/x86/intel: Support LBR event logging kan.liang
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: kan.liang @ 2023-09-11 15:48 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, namhyung, irogers,
	adrian.hunter, ak, eranian, alexey.v.bayduraev, tinghao.zhang,
	Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Add a helper function to check call stack sample type.

The later patch will invoke the function in several places.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

New patch

 arch/x86/events/core.c     | 2 +-
 include/linux/perf_event.h | 5 +++++
 2 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 2919bb5a53a0..b56e9c3a0cc8 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -601,7 +601,7 @@ int x86_pmu_hw_config(struct perf_event *event)
 		}
 	}
 
-	if (event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK)
+	if (branch_sample_call_stack(event))
 		event->attach_state |= PERF_ATTACH_TASK_DATA;
 
 	/*
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c4877924d43c..58ad6745cdda 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -1147,6 +1147,11 @@ static inline bool branch_sample_evt_cntrs(const struct perf_event *event)
 	return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_EVT_CNTRS;
 }
 
+static inline bool branch_sample_call_stack(const struct perf_event *event)
+{
+	return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_CALL_STACK;
+}
+
 struct perf_sample_data {
 	/*
 	 * Fields set by perf_sample_data_init() unconditionally,
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RESEND PATCH V3 4/6] perf/x86/intel: Support LBR event logging
  2023-09-11 15:48 [RESEND PATCH V3 1/6] perf: Add branch stack extra kan.liang
  2023-09-11 15:48 ` [RESEND PATCH V3 2/6] perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag kan.liang
  2023-09-11 15:48 ` [RESEND PATCH V3 3/6] perf: Add branch_sample_call_stack kan.liang
@ 2023-09-11 15:48 ` kan.liang
  2023-09-11 15:48 ` [RESEND PATCH V3 5/6] tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel kan.liang
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: kan.liang @ 2023-09-11 15:48 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, namhyung, irogers,
	adrian.hunter, ak, eranian, alexey.v.bayduraev, tinghao.zhang,
	Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The LBR event logging introduces a per-counter indication of precise
event occurrences in LBRs. It can provide a means to attribute exposed
retirement latency to combinations of events across a block of
instructions. It also provides a means of attributing Timed LBR
latencies to events.

The feature is first introduced on SRF/GRR. It is an enhancement of the
ARCH LBR. It adds new fields in the LBR_INFO MSRs to log the occurrences
of events on the GP counters. The information is displayed by the order
of counters.

The design proposed in this patch requires that the events which are
logged must be in a group with the event that has LBR. If there are
more than one LBR group, the event logging information only from the
current group (overflowed) are stored for the perf tool, otherwise the
perf tool cannot know which and when other groups are scheduled
especially when multiplexing is triggered. The user can ensure it uses
the maximum number of counters that support LBR info (4 by now) by
making the group large enough.

The HW only logs events by the order of counters. The order may be
different from the order of enabling which the perf tool can understand.
When parsing the information of each branch entry, convert the counter
order to the enabled order, and store the enabled order in the extension
space.

Unconditionally reset LBRs for an LBR event group when it's deleted. The
logged events' occurrences information is only valid for the current LBR
group. If another LBR group is scheduled later, the information from the
stale LBRs would be otherwise wrongly interpreted.

Add a sanity check in intel_pmu_hw_config(). Disable the feature if other
counter filters (inv, cmask, edge, in_tx) are set or LBR call stack mode
is enabled. (For the LBR call stack mode, we cannot simply flush the
LBR, since it will break the call stack. Also, there is no obvious usage
with the call stack mode for now.)
Only applying the PERF_SAMPLE_BRANCH_EVT_CNTRS or the
PERF_SAMPLE_BRANCH_EXTRA doesn't require any branch stack setup.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V2:
- Drop the SW bit LBR_EVENT_LOG_BIT and flag PERF_X86_EVENT_LBR_EVENT.
  They can be replaced by branch_sample_evt_cntrs() and
  branch_sample_extra().

 arch/x86/events/intel/core.c      | 62 ++++++++++++++++++--
 arch/x86/events/intel/ds.c        |  2 +-
 arch/x86/events/intel/lbr.c       | 94 ++++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h      |  7 +++
 arch/x86/include/asm/msr-index.h  |  2 +
 arch/x86/include/asm/perf_event.h |  4 ++
 6 files changed, 164 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 8434315b765f..e54f4db89da1 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2803,6 +2803,7 @@ static void intel_pmu_enable_fixed(struct perf_event *event)
 
 static void intel_pmu_enable_event(struct perf_event *event)
 {
+	u64 enable_mask = ARCH_PERFMON_EVENTSEL_ENABLE;
 	struct hw_perf_event *hwc = &event->hw;
 	int idx = hwc->idx;
 
@@ -2811,8 +2812,10 @@ static void intel_pmu_enable_event(struct perf_event *event)
 
 	switch (idx) {
 	case 0 ... INTEL_PMC_IDX_FIXED - 1:
+		if (branch_sample_evt_cntrs(event))
+			enable_mask |= ARCH_PERFMON_EVENTSEL_LBR_LOG;
 		intel_set_masks(event, idx);
-		__x86_pmu_enable_event(hwc, ARCH_PERFMON_EVENTSEL_ENABLE);
+		__x86_pmu_enable_event(hwc, enable_mask);
 		break;
 	case INTEL_PMC_IDX_FIXED ... INTEL_PMC_IDX_FIXED_BTS - 1:
 	case INTEL_PMC_IDX_METRIC_BASE ... INTEL_PMC_IDX_METRIC_END:
@@ -3063,7 +3066,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
 		perf_sample_data_init(&data, 0, event->hw.last_period);
 
 		if (has_branch_stack(event))
-			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
+			intel_pmu_lbr_save_brstack(&data, cpuc, event);
 
 		if (perf_event_overflow(event, &data, regs))
 			x86_pmu_stop(event, 0);
@@ -3628,6 +3631,13 @@ intel_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 	if (cpuc->excl_cntrs)
 		return intel_get_excl_constraints(cpuc, event, idx, c2);
 
+	/* The LBR event logging may not be available for all counters. */
+	if (branch_sample_evt_cntrs(event)) {
+		c2 = dyn_constraint(cpuc, c2, idx);
+		c2->idxmsk64 &= x86_pmu.lbr_events;
+		c2->weight = hweight64(c2->idxmsk64);
+	}
+
 	return c2;
 }
 
@@ -3916,6 +3926,44 @@ static int intel_pmu_hw_config(struct perf_event *event)
 	if (needs_branch_stack(event) && is_sampling_event(event))
 		event->hw.flags  |= PERF_X86_EVENT_NEEDS_BRANCH_STACK;
 
+	if (branch_sample_evt_cntrs(event) || branch_sample_extra(event)) {
+		struct perf_event *leader, *sibling;
+
+		if (!(x86_pmu.flags & PMU_FL_LBR_EVENT) ||
+		    (event->attr.config & ~INTEL_ARCH_EVENT_MASK))
+			return -EINVAL;
+
+		/*
+		 * The event logging is not supported in the call stack mode
+		 * yet, since we cannot simply flush the LBR during e.g.,
+		 * multiplexing. Also, there is no obvious usage with the call
+		 * stack mode. Simply forbids it for now.
+		 *
+		 * If any events in the group enable the LBR event logging
+		 * feature, the group is treated as a LBR event logging group,
+		 * which requires the extra space.
+		 */
+		leader = event->group_leader;
+		if (branch_sample_call_stack(leader) || !branch_sample_extra(leader))
+			return -EINVAL;
+
+		for_each_sibling_event(sibling, leader) {
+			if (branch_sample_call_stack(sibling) || !branch_sample_extra(leader))
+				return -EINVAL;
+		}
+
+		/*
+		 * Only applying the PERF_SAMPLE_BRANCH_EVT_CNTRS or the
+		 * PERF_SAMPLE_BRANCH_EXTRA doesn't require any branch stack setup.
+		 * Clear the bit to avoid unnecessary branch stack setup.
+		 */
+		if (0 == (event->attr.branch_sample_type &
+			  ~(PERF_SAMPLE_BRANCH_PLM_ALL |
+			    PERF_SAMPLE_BRANCH_EVT_CNTRS |
+			    PERF_SAMPLE_BRANCH_EXTRA)))
+			event->hw.flags  &= ~PERF_X86_EVENT_NEEDS_BRANCH_STACK;
+	}
+
 	if (intel_pmu_needs_branch_stack(event)) {
 		ret = intel_pmu_setup_lbr_filter(event);
 		if (ret)
@@ -4387,8 +4435,12 @@ cmt_get_event_constraints(struct cpu_hw_events *cpuc, int idx,
 	 */
 	if (event->attr.precise_ip == 3) {
 		/* Force instruction:ppp on PMC0, 1 and Fixed counter 0 */
-		if (constraint_match(&fixed0_constraint, event->hw.config))
-			return &fixed0_counter0_1_constraint;
+		if (constraint_match(&fixed0_constraint, event->hw.config)) {
+			if (branch_sample_evt_cntrs(event))
+				return &counter0_1_constraint;
+			else
+				return &fixed0_counter0_1_constraint;
+		}
 
 		switch (c->idxmsk64 & 0x3ull) {
 		case 0x1:
@@ -4567,7 +4619,7 @@ int intel_cpuc_prepare(struct cpu_hw_events *cpuc, int cpu)
 			goto err;
 	}
 
-	if (x86_pmu.flags & (PMU_FL_EXCL_CNTRS | PMU_FL_TFA)) {
+	if (x86_pmu.flags & (PMU_FL_EXCL_CNTRS | PMU_FL_TFA | PMU_FL_LBR_EVENT)) {
 		size_t sz = X86_PMC_IDX_MAX * sizeof(struct event_constraint);
 
 		cpuc->constraint_list = kzalloc_node(sz, GFP_KERNEL, cpu_to_node(cpu));
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 7566190389f0..5ff81dbf8aa3 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1912,7 +1912,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
 
 		if (has_branch_stack(event)) {
 			intel_pmu_store_pebs_lbrs(lbr);
-			perf_sample_save_brstack(data, event, &cpuc->lbr_stack, NULL);
+			intel_pmu_lbr_save_brstack(data, cpuc, event);
 		}
 	}
 
diff --git a/arch/x86/events/intel/lbr.c b/arch/x86/events/intel/lbr.c
index c3b0d15a9841..5b7159b9db92 100644
--- a/arch/x86/events/intel/lbr.c
+++ b/arch/x86/events/intel/lbr.c
@@ -676,6 +676,21 @@ void intel_pmu_lbr_del(struct perf_event *event)
 	WARN_ON_ONCE(cpuc->lbr_users < 0);
 	WARN_ON_ONCE(cpuc->lbr_pebs_users < 0);
 	perf_sched_cb_dec(event->pmu);
+
+	/*
+	 * The logged occurrences information is only valid for the
+	 * current LBR group. If another LBR group is scheduled in
+	 * later, the information from the stale LBRs will be wrongly
+	 * interpreted. Reset the LBRs here.
+	 * For the context switch, the LBR will be unconditionally
+	 * flushed when a new task is scheduled in. If both the new task
+	 * and the old task are monitored by a LBR event group. The
+	 * reset here is redundant. But the extra reset doesn't impact
+	 * the functionality. It's hard to distinguish the above case.
+	 * Keep the unconditionally reset for a LBR event group for now.
+	 */
+	if (branch_sample_extra(event))
+		intel_pmu_lbr_reset();
 }
 
 static inline bool vlbr_exclude_host(void)
@@ -866,6 +881,18 @@ static __always_inline u16 get_lbr_cycles(u64 info)
 	return cycles;
 }
 
+static __always_inline void get_lbr_events(struct cpu_hw_events *cpuc,
+					   int i, u64 info)
+{
+	/*
+	 * The later code will decide what content can be disclosed
+	 * to the perf tool. It's no harmful to unconditionally update
+	 * the cpuc->lbr_events.
+	 * Pleae see intel_pmu_lbr_event_reorder()
+	 */
+	cpuc->lbr_events[i] = info & LBR_INFO_EVENTS;
+}
+
 static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
 				struct lbr_entry *entries)
 {
@@ -898,11 +925,70 @@ static void intel_pmu_store_lbr(struct cpu_hw_events *cpuc,
 		e->abort	= !!(info & LBR_INFO_ABORT);
 		e->cycles	= get_lbr_cycles(info);
 		e->type		= get_lbr_br_type(info);
+
+		get_lbr_events(cpuc, i, info);
 	}
 
 	cpuc->lbr_stack.nr = i;
 }
 
+#define ARCH_LBR_EVENT_LOG_WIDTH	2
+#define ARCH_LBR_EVENT_LOG_MASK		0x3
+
+static __always_inline void intel_pmu_update_lbr_event(u64 *lbr_events, int idx, int pos)
+{
+	u64 logs = *lbr_events >> (LBR_INFO_EVENTS_OFFSET +
+				   idx * ARCH_LBR_EVENT_LOG_WIDTH);
+
+	logs &= ARCH_LBR_EVENT_LOG_MASK;
+	*lbr_events |= logs << (pos * ARCH_LBR_EVENT_LOG_WIDTH);
+}
+
+/*
+ * The enabled order may be different from the counter order.
+ * Update the lbr_events with the enabled order.
+ */
+static void intel_pmu_lbr_event_reorder(struct cpu_hw_events *cpuc,
+					struct perf_event *event)
+{
+	int i, j, pos = 0, enabled[X86_PMC_IDX_MAX];
+	struct perf_event *leader, *sibling;
+
+	leader = event->group_leader;
+	if (branch_sample_evt_cntrs(leader))
+		enabled[pos++] = leader->hw.idx;
+
+	for_each_sibling_event(sibling, leader) {
+		if (!branch_sample_evt_cntrs(sibling))
+			continue;
+		enabled[pos++] = sibling->hw.idx;
+	}
+
+	if (!pos)
+		return;
+
+	for (i = 0; i < cpuc->lbr_stack.nr; i++) {
+		for (j = 0; j < pos; j++)
+			intel_pmu_update_lbr_event(&cpuc->lbr_events[i], enabled[j], j);
+
+		/* Clear the original counter order */
+		cpuc->lbr_events[i] &= ~LBR_INFO_EVENTS;
+	}
+}
+
+void intel_pmu_lbr_save_brstack(struct perf_sample_data *data,
+				struct cpu_hw_events *cpuc,
+				struct perf_event *event)
+{
+	if (!branch_sample_extra(event)) {
+		perf_sample_save_brstack(data, event, &cpuc->lbr_stack, NULL);
+		return;
+	}
+
+	intel_pmu_lbr_event_reorder(cpuc, event);
+	perf_sample_save_brstack(data, event, &cpuc->lbr_stack, cpuc->lbr_events);
+}
+
 static void intel_pmu_arch_lbr_read(struct cpu_hw_events *cpuc)
 {
 	intel_pmu_store_lbr(cpuc, NULL);
@@ -1173,8 +1259,10 @@ intel_pmu_lbr_filter(struct cpu_hw_events *cpuc)
 	for (i = 0; i < cpuc->lbr_stack.nr; ) {
 		if (!cpuc->lbr_entries[i].from) {
 			j = i;
-			while (++j < cpuc->lbr_stack.nr)
+			while (++j < cpuc->lbr_stack.nr) {
 				cpuc->lbr_entries[j-1] = cpuc->lbr_entries[j];
+				cpuc->lbr_events[j-1] = cpuc->lbr_events[j];
+			}
 			cpuc->lbr_stack.nr--;
 			if (!cpuc->lbr_entries[i].from)
 				continue;
@@ -1525,8 +1613,12 @@ void __init intel_pmu_arch_lbr_init(void)
 	x86_pmu.lbr_mispred = ecx.split.lbr_mispred;
 	x86_pmu.lbr_timed_lbr = ecx.split.lbr_timed_lbr;
 	x86_pmu.lbr_br_type = ecx.split.lbr_br_type;
+	x86_pmu.lbr_events = ecx.split.lbr_events;
 	x86_pmu.lbr_nr = lbr_nr;
 
+	if (!!x86_pmu.lbr_events)
+		x86_pmu.flags |= PMU_FL_LBR_EVENT;
+
 	if (x86_pmu.lbr_mispred)
 		static_branch_enable(&x86_lbr_mispred);
 	if (x86_pmu.lbr_timed_lbr)
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index c8ba2be7585d..4a51253a107c 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -283,6 +283,7 @@ struct cpu_hw_events {
 	int				lbr_pebs_users;
 	struct perf_branch_stack	lbr_stack;
 	struct perf_branch_entry	lbr_entries[MAX_LBR_ENTRIES];
+	u64				lbr_events[MAX_LBR_ENTRIES]; /* branch stack extra */
 	union {
 		struct er_account		*lbr_sel;
 		struct er_account		*lbr_ctl;
@@ -881,6 +882,7 @@ struct x86_pmu {
 	unsigned int	lbr_mispred:1;
 	unsigned int	lbr_timed_lbr:1;
 	unsigned int	lbr_br_type:1;
+	unsigned int	lbr_events:4;
 
 	void		(*lbr_reset)(void);
 	void		(*lbr_read)(struct cpu_hw_events *cpuc);
@@ -1005,6 +1007,7 @@ do {									\
 #define PMU_FL_INSTR_LATENCY	0x80 /* Support Instruction Latency in PEBS Memory Info Record */
 #define PMU_FL_MEM_LOADS_AUX	0x100 /* Require an auxiliary event for the complete memory info */
 #define PMU_FL_RETIRE_LATENCY	0x200 /* Support Retire Latency in PEBS */
+#define PMU_FL_LBR_EVENT	0x400 /* Support LBR event logging */
 
 #define EVENT_VAR(_id)  event_attr_##_id
 #define EVENT_PTR(_id) &event_attr_##_id.attr.attr
@@ -1545,6 +1548,10 @@ void intel_pmu_store_pebs_lbrs(struct lbr_entry *lbr);
 
 void intel_ds_init(void);
 
+void intel_pmu_lbr_save_brstack(struct perf_sample_data *data,
+				struct cpu_hw_events *cpuc,
+				struct perf_event *event);
+
 void intel_pmu_lbr_swap_task_ctx(struct perf_event_pmu_context *prev_epc,
 				 struct perf_event_pmu_context *next_epc);
 
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3aedae61af4f..d38091cc174a 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -224,6 +224,8 @@
 #define LBR_INFO_CYCLES			0xffff
 #define LBR_INFO_BR_TYPE_OFFSET		56
 #define LBR_INFO_BR_TYPE		(0xfull << LBR_INFO_BR_TYPE_OFFSET)
+#define LBR_INFO_EVENTS_OFFSET		32
+#define LBR_INFO_EVENTS			(0xffull << LBR_INFO_EVENTS_OFFSET)
 
 #define MSR_ARCH_LBR_CTL		0x000014ce
 #define ARCH_LBR_CTL_LBREN		BIT(0)
diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h
index 85a9fd5a3ec3..7677605a39ef 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -31,6 +31,7 @@
 #define ARCH_PERFMON_EVENTSEL_ENABLE			(1ULL << 22)
 #define ARCH_PERFMON_EVENTSEL_INV			(1ULL << 23)
 #define ARCH_PERFMON_EVENTSEL_CMASK			0xFF000000ULL
+#define ARCH_PERFMON_EVENTSEL_LBR_LOG			(1ULL << 35)
 
 #define INTEL_FIXED_BITS_MASK				0xFULL
 #define INTEL_FIXED_BITS_STRIDE			4
@@ -216,6 +217,9 @@ union cpuid28_ecx {
 		unsigned int    lbr_timed_lbr:1;
 		/* Branch Type Field Supported */
 		unsigned int    lbr_br_type:1;
+		unsigned int	reserved:13;
+		/* Event Logging Supported */
+		unsigned int	lbr_events:4;
 	} split;
 	unsigned int            full;
 };
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RESEND PATCH V3 5/6] tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel
  2023-09-11 15:48 [RESEND PATCH V3 1/6] perf: Add branch stack extra kan.liang
                   ` (2 preceding siblings ...)
  2023-09-11 15:48 ` [RESEND PATCH V3 4/6] perf/x86/intel: Support LBR event logging kan.liang
@ 2023-09-11 15:48 ` kan.liang
  2023-09-11 15:48 ` [RESEND PATCH V3 6/6] perf tools: Add branch event knob kan.liang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 17+ messages in thread
From: kan.liang @ 2023-09-11 15:48 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, namhyung, irogers,
	adrian.hunter, ak, eranian, alexey.v.bayduraev, tinghao.zhang,
	Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Sync the new sample types and extra space for the branch event feature.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V2:
- Drop the new bit in struct perf_branch_entry
- Introduce a new sample type PERF_SAMPLE_BRANCH_EXTRA

 tools/include/uapi/linux/perf_event.h | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 39c6a250dd1b..252066579dae 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -204,6 +204,10 @@ enum perf_branch_sample_type_shift {
 
 	PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT	= 18, /* save privilege mode */
 
+	PERF_SAMPLE_BRANCH_EXTRA_SHIFT		= 19, /* support extra space */
+
+	PERF_SAMPLE_BRANCH_EVT_CNTRS_SHIFT	= 20, /* save occurrences of events on a branch */
+
 	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
 };
 
@@ -235,6 +239,10 @@ enum perf_branch_sample_type {
 
 	PERF_SAMPLE_BRANCH_PRIV_SAVE	= 1U << PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT,
 
+	PERF_SAMPLE_BRANCH_EXTRA	= 1U << PERF_SAMPLE_BRANCH_EXTRA_SHIFT,
+
+	PERF_SAMPLE_BRANCH_EVT_CNTRS	= 1U << PERF_SAMPLE_BRANCH_EVT_CNTRS_SHIFT,
+
 	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
 };
 
@@ -982,6 +990,7 @@ enum perf_event_type {
 	 *	{ u64                   nr;
 	 *	  { u64	hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
 	 *        { u64 from, to, flags } lbr[nr];
+	 *        { u64 extra; } ext[nr] && PERF_SAMPLE_BRANCH_EXTRA
 	 *      } && PERF_SAMPLE_BRANCH_STACK
 	 *
 	 * 	{ u64			abi; # enum perf_sample_regs_abi
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RESEND PATCH V3 6/6] perf tools: Add branch event knob
  2023-09-11 15:48 [RESEND PATCH V3 1/6] perf: Add branch stack extra kan.liang
                   ` (3 preceding siblings ...)
  2023-09-11 15:48 ` [RESEND PATCH V3 5/6] tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel kan.liang
@ 2023-09-11 15:48 ` kan.liang
  2023-10-02 13:49 ` [RESEND PATCH V3 1/6] perf: Add branch stack extra Liang, Kan
  2023-10-02 15:45 ` Peter Zijlstra
  6 siblings, 0 replies; 17+ messages in thread
From: kan.liang @ 2023-09-11 15:48 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, namhyung, irogers,
	adrian.hunter, ak, eranian, alexey.v.bayduraev, tinghao.zhang,
	Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Add a new branch filter, "event", for the branch event option. It is
used to mark the events which should be logged in the branch. If it is
applied with the -j option, all the events should be logged in the
branch. If the legacy kernel doesn't support the new branch sample type,
switching off the branch event filter.

The branch event requires extra space. If the "event" branch filter is
detected, the PERF_SAMPLE_BRANCH_EXTRA is applied automatically for the
entire group.

The new extra space of each branch is dumped right after the regular
branch stack information via perf report -D.

Usage examples:

perf record -e "{branch-instructions,branch-misses}:S" -j any,event

Only the first event, branch-instructions, collect the LBR. Both
branch-instructions and branch-misses are marked as logged events.
The occurrences information of them can be found in the branch stack
extension space of each branch.

perf record -e "{cpu/branch-instructions,branch_type=any/,
cpu/branch-misses,branch_type=event/}"

Only the first event, branch-instructions, collect the LBR. Only the
branch-misses event is marked as a logged event.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---

Changes since V2:
- Drop the new bit in struct perf_branch_entry
- Support PERF_SAMPLE_BRANCH_EXTRA

 tools/perf/Documentation/perf-record.txt  |  4 ++++
 tools/perf/arch/x86/util/evsel.c          | 17 ++++++++++++++-
 tools/perf/util/evsel.c                   | 26 ++++++++++++++++++++++-
 tools/perf/util/evsel.h                   | 12 +++++++++++
 tools/perf/util/parse-branch-options.c    |  1 +
 tools/perf/util/perf_event_attr_fprintf.c |  1 +
 tools/perf/util/sample.h                  |  1 +
 tools/perf/util/session.c                 |  8 +++++++
 8 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 680396c56bd1..91a77fae11bb 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -456,6 +456,10 @@ following filters are defined:
 		     4th-Gen Xeon+ server), the save branch type is unconditionally enabled
 		     when the taken branch stack sampling is enabled.
 	- priv: save privilege state during sampling in case binary is not available later
+	- event: save occurrences of the event since the last branch entry. Currently, the
+		 feature is only supported by a newer CPU, e.g., Intel Sierra Forest and
+		 later platforms. An error out is expected if it's used on the unsupported
+		 kernel or CPUs.
 
 +
 The option requires at least one branch type among any, any_call, any_ret, ind_call, cond.
diff --git a/tools/perf/arch/x86/util/evsel.c b/tools/perf/arch/x86/util/evsel.c
index 81d22657922a..038c3454d76f 100644
--- a/tools/perf/arch/x86/util/evsel.c
+++ b/tools/perf/arch/x86/util/evsel.c
@@ -9,6 +9,7 @@
 #include "evsel.h"
 #include "util/debug.h"
 #include "env.h"
+#include "../../../util/evlist.h"
 
 #define IBS_FETCH_L3MISSONLY   (1ULL << 59)
 #define IBS_OP_L3MISSONLY      (1ULL << 16)
@@ -80,9 +81,23 @@ void arch__post_evsel_config(struct evsel *evsel, struct perf_event_attr *attr)
 	struct perf_pmu *evsel_pmu, *ibs_fetch_pmu, *ibs_op_pmu;
 	static int warned_once;
 
-	if (warned_once || !x86__is_amd_cpu())
+	if (warned_once)
 		return;
 
+	if (!x86__is_amd_cpu()) {
+		if (evsel__has_branch_evt_cntrs(evsel)) {
+			struct evsel *cur, *leader = evsel__leader(evsel);
+
+			/* The extra space is required for the LBR event group */
+			evlist__for_each_entry(evsel->evlist, cur) {
+				if (leader == evsel__leader(cur))
+					cur->core.attr.branch_sample_type |= PERF_SAMPLE_BRANCH_EXTRA;
+			}
+		}
+
+		return;
+	}
+
 	evsel_pmu = evsel__find_pmu(evsel);
 	if (!evsel_pmu)
 		return;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 762e2b2634a5..a70007beab14 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1828,6 +1828,10 @@ static int __evsel__prepare_open(struct evsel *evsel, struct perf_cpu_map *cpus,
 
 static void evsel__disable_missing_features(struct evsel *evsel)
 {
+	if (perf_missing_features.branch_extra)
+		evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_EXTRA;
+	if (perf_missing_features.branch_event)
+		evsel->core.attr.branch_sample_type &= ~PERF_SAMPLE_BRANCH_EVT_CNTRS;
 	if (perf_missing_features.read_lost)
 		evsel->core.attr.read_format &= ~PERF_FORMAT_LOST;
 	if (perf_missing_features.weight_struct) {
@@ -1881,7 +1885,17 @@ bool evsel__detect_missing_features(struct evsel *evsel)
 	 * Must probe features in the order they were added to the
 	 * perf_event_attr interface.
 	 */
-	if (!perf_missing_features.read_lost &&
+	if (!perf_missing_features.branch_extra &&
+	    (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_EXTRA)) {
+		perf_missing_features.branch_extra = true;
+		pr_debug2("switching off branch extra space support\n");
+		return true;
+	} else if (!perf_missing_features.branch_event &&
+	    (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_EVT_CNTRS)) {
+		perf_missing_features.branch_event = true;
+		pr_debug2("switching off branch event support\n");
+		return true;
+	} else if (!perf_missing_features.read_lost &&
 	    (evsel->core.attr.read_format & PERF_FORMAT_LOST)) {
 		perf_missing_features.read_lost = true;
 		pr_debug2("switching off PERF_FORMAT_LOST support\n");
@@ -2574,6 +2588,16 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 
 		OVERFLOW_CHECK(array, sz, max_size);
 		array = (void *)array + sz;
+
+		if (evsel__has_branch_extra(evsel)) {
+			OVERFLOW_CHECK_u64(array);
+
+			data->branch_stack_ext = (u64 *)array;
+			sz = data->branch_stack->nr * sizeof(u64);
+
+			OVERFLOW_CHECK(array, sz, max_size);
+			array = (void *)array + sz;
+		}
 	}
 
 	if (type & PERF_SAMPLE_REGS_USER) {
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 848534ec74fa..f476dd68bb4c 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -191,6 +191,8 @@ struct perf_missing_features {
 	bool code_page_size;
 	bool weight_struct;
 	bool read_lost;
+	bool branch_event;
+	bool branch_extra;
 };
 
 extern struct perf_missing_features perf_missing_features;
@@ -499,6 +501,16 @@ static inline bool evsel__has_branch_hw_idx(const struct evsel *evsel)
 	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_HW_INDEX;
 }
 
+static inline bool evsel__has_branch_evt_cntrs(const struct evsel *evsel)
+{
+	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_EVT_CNTRS;
+}
+
+static inline bool evsel__has_branch_extra(const struct evsel *evsel)
+{
+	return evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_EXTRA;
+}
+
 static inline bool evsel__has_callchain(const struct evsel *evsel)
 {
 	/*
diff --git a/tools/perf/util/parse-branch-options.c b/tools/perf/util/parse-branch-options.c
index fd67d204d720..ab5d6dabe659 100644
--- a/tools/perf/util/parse-branch-options.c
+++ b/tools/perf/util/parse-branch-options.c
@@ -36,6 +36,7 @@ static const struct branch_mode branch_modes[] = {
 	BRANCH_OPT("stack", PERF_SAMPLE_BRANCH_CALL_STACK),
 	BRANCH_OPT("hw_index", PERF_SAMPLE_BRANCH_HW_INDEX),
 	BRANCH_OPT("priv", PERF_SAMPLE_BRANCH_PRIV_SAVE),
+	BRANCH_OPT("event", PERF_SAMPLE_BRANCH_EVT_CNTRS),
 	BRANCH_END
 };
 
diff --git a/tools/perf/util/perf_event_attr_fprintf.c b/tools/perf/util/perf_event_attr_fprintf.c
index 2247991451f3..7c9fdd62d920 100644
--- a/tools/perf/util/perf_event_attr_fprintf.c
+++ b/tools/perf/util/perf_event_attr_fprintf.c
@@ -55,6 +55,7 @@ static void __p_branch_sample_type(char *buf, size_t size, u64 value)
 		bit_name(COND), bit_name(CALL_STACK), bit_name(IND_JUMP),
 		bit_name(CALL), bit_name(NO_FLAGS), bit_name(NO_CYCLES),
 		bit_name(TYPE_SAVE), bit_name(HW_INDEX), bit_name(PRIV_SAVE),
+		bit_name(EVT_CNTRS), bit_name(EXTRA),
 		{ .name = NULL, }
 	};
 #undef bit_name
diff --git a/tools/perf/util/sample.h b/tools/perf/util/sample.h
index c92ad0f51ecd..3abe892aae60 100644
--- a/tools/perf/util/sample.h
+++ b/tools/perf/util/sample.h
@@ -113,6 +113,7 @@ struct perf_sample {
 	void *raw_data;
 	struct ip_callchain *callchain;
 	struct branch_stack *branch_stack;
+	u64 *branch_stack_ext;
 	struct regs_dump  user_regs;
 	struct regs_dump  intr_regs;
 	struct stack_dump user_stack;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 00d18c74c090..b58cfef5d767 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1153,6 +1153,7 @@ static void callchain__printf(struct evsel *evsel,
 static void branch_stack__printf(struct perf_sample *sample, bool callstack)
 {
 	struct branch_entry *entries = perf_sample__branch_entries(sample);
+	u64 *branch_stack_ext = sample->branch_stack_ext;
 	uint64_t i;
 
 	if (!callstack) {
@@ -1194,6 +1195,13 @@ static void branch_stack__printf(struct perf_sample *sample, bool callstack)
 			}
 		}
 	}
+
+	if (branch_stack_ext) {
+		printf("... branch stack extra: nr:%" PRIu64 "\n", sample->branch_stack->nr);
+		for (i = 0; i < sample->branch_stack->nr; i++) {
+			printf("..... %2"PRIu64": %016" PRIx64 "\n", i, branch_stack_ext[i]);
+		}
+	}
 }
 
 static void regs_dump__printf(u64 mask, u64 *regs, const char *arch)
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-09-11 15:48 [RESEND PATCH V3 1/6] perf: Add branch stack extra kan.liang
                   ` (4 preceding siblings ...)
  2023-09-11 15:48 ` [RESEND PATCH V3 6/6] perf tools: Add branch event knob kan.liang
@ 2023-10-02 13:49 ` Liang, Kan
  2023-10-02 15:45 ` Peter Zijlstra
  6 siblings, 0 replies; 17+ messages in thread
From: Liang, Kan @ 2023-10-02 13:49 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: mark.rutland, alexander.shishkin, jolsa, namhyung, irogers,
	adrian.hunter, ak, eranian, alexey.v.bayduraev, tinghao.zhang,
	Sandipan Das, Ravi Bangoria, Athira Rajeev

Hi Peter,

Could you please share your comments for this series?

Thanks,
Kan

On 2023-09-11 11:48 a.m., kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Currently, the additional information of a branch entry is stored in a
> u64 space. With more and more information added, the space is running
> out. For example, the information of occurrences of events will be added
> for each branch.
> 
> Add a new branch sample type, PERF_SAMPLE_BRANCH_EXTRA, to indicate
> whether to support an extra space.
> 
> Two places were suggested to append the extra space.
> https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/
> One place is right after the flags of each branch entry. It changes the
> existing struct perf_branch_entry. In the later Intel-specific
> implementation, two separate spaces have to be created in the
> struct cpu_hw_events to store different branch entry structures. That
> duplicates space.
> The other place is right after the entire struct perf_branch_stack.
> Only adding the new extra space in the struct cpu_hw_event is necessary.
> The disadvantage is that the pointer of the extra space has to be
> recorded. The common interface perf_sample_save_brstack() has to be
> updated as well.
> 
> The latter requires less space and is much straight forward. It is
> implemented in the patch.
> 
> Also, add a new branch sample type, PERF_SAMPLE_BRANCH_EVT_CNTRS, to
> indicate whether include occurrences of events in branch info. The
> information will be stored in the extra space.
> 
> Add two helper functions for the new branch sample types.
> 
> Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Cc: Sandipan Das <sandipan.das@amd.com>
> Cc: Ravi Bangoria <ravi.bangoria@amd.com>
> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
> ---
> 
> Changes since V2:
> - Drop the new bit in struct perf_branch_entry
> - Introduce a new sample type PERF_SAMPLE_BRANCH_EXTRA
> 
>  arch/powerpc/perf/core-book3s.c |  2 +-
>  arch/x86/events/amd/core.c      |  2 +-
>  arch/x86/events/core.c          |  2 +-
>  arch/x86/events/intel/core.c    |  2 +-
>  arch/x86/events/intel/ds.c      |  4 ++--
>  include/linux/perf_event.h      | 22 +++++++++++++++++++++-
>  include/uapi/linux/perf_event.h |  9 +++++++++
>  kernel/events/core.c            |  8 ++++++++
>  8 files changed, 44 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
> index 8c1f7def596e..3c14596bbfaf 100644
> --- a/arch/powerpc/perf/core-book3s.c
> +++ b/arch/powerpc/perf/core-book3s.c
> @@ -2313,7 +2313,7 @@ static void record_and_restart(struct perf_event *event, unsigned long val,
>  			struct cpu_hw_events *cpuhw;
>  			cpuhw = this_cpu_ptr(&cpu_hw_events);
>  			power_pmu_bhrb_read(event, cpuhw);
> -			perf_sample_save_brstack(&data, event, &cpuhw->bhrb_stack);
> +			perf_sample_save_brstack(&data, event, &cpuhw->bhrb_stack, NULL);
>  		}
>  
>  		if (event->attr.sample_type & PERF_SAMPLE_DATA_SRC &&
> diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c
> index abadd5f23425..01c0619d12d5 100644
> --- a/arch/x86/events/amd/core.c
> +++ b/arch/x86/events/amd/core.c
> @@ -930,7 +930,7 @@ static int amd_pmu_v2_handle_irq(struct pt_regs *regs)
>  			continue;
>  
>  		if (has_branch_stack(event))
> -			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
> +			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
>  
>  		if (perf_event_overflow(event, &data, regs))
>  			x86_pmu_stop(event, 0);
> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index 185f902e5f28..2919bb5a53a0 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -1702,7 +1702,7 @@ int x86_pmu_handle_irq(struct pt_regs *regs)
>  		perf_sample_data_init(&data, 0, event->hw.last_period);
>  
>  		if (has_branch_stack(event))
> -			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
> +			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
>  
>  		if (perf_event_overflow(event, &data, regs))
>  			x86_pmu_stop(event, 0);
> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index 64a3533997e1..0f4ce79de4f8 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c
> @@ -3058,7 +3058,7 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
>  		perf_sample_data_init(&data, 0, event->hw.last_period);
>  
>  		if (has_branch_stack(event))
> -			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack);
> +			perf_sample_save_brstack(&data, event, &cpuc->lbr_stack, NULL);
>  
>  		if (perf_event_overflow(event, &data, regs))
>  			x86_pmu_stop(event, 0);
> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index eb8dd8b8a1e8..7566190389f0 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c
> @@ -1755,7 +1755,7 @@ static void setup_pebs_fixed_sample_data(struct perf_event *event,
>  		setup_pebs_time(event, data, pebs->tsc);
>  
>  	if (has_branch_stack(event))
> -		perf_sample_save_brstack(data, event, &cpuc->lbr_stack);
> +		perf_sample_save_brstack(data, event, &cpuc->lbr_stack, NULL);
>  }
>  
>  static void adaptive_pebs_save_regs(struct pt_regs *regs,
> @@ -1912,7 +1912,7 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
>  
>  		if (has_branch_stack(event)) {
>  			intel_pmu_store_pebs_lbrs(lbr);
> -			perf_sample_save_brstack(data, event, &cpuc->lbr_stack);
> +			perf_sample_save_brstack(data, event, &cpuc->lbr_stack, NULL);
>  		}
>  	}
>  
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index e83f13ce4a9f..c4877924d43c 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -1137,6 +1137,15 @@ static inline bool branch_sample_priv(const struct perf_event *event)
>  	return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_PRIV_SAVE;
>  }
>  
> +static inline bool branch_sample_extra(const struct perf_event *event)
> +{
> +	return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_EXTRA;
> +}
> +
> +static inline bool branch_sample_evt_cntrs(const struct perf_event *event)
> +{
> +	return event->attr.branch_sample_type & PERF_SAMPLE_BRANCH_EVT_CNTRS;
> +}
>  
>  struct perf_sample_data {
>  	/*
> @@ -1171,6 +1180,7 @@ struct perf_sample_data {
>  	struct perf_callchain_entry	*callchain;
>  	struct perf_raw_record		*raw;
>  	struct perf_branch_stack	*br_stack;
> +	u64				*br_stack_ext;
>  	union perf_sample_weight	weight;
>  	union  perf_mem_data_src	data_src;
>  	u64				txn;
> @@ -1248,7 +1258,8 @@ static inline void perf_sample_save_raw_data(struct perf_sample_data *data,
>  
>  static inline void perf_sample_save_brstack(struct perf_sample_data *data,
>  					    struct perf_event *event,
> -					    struct perf_branch_stack *brs)
> +					    struct perf_branch_stack *brs,
> +					    u64 *brs_ext)
>  {
>  	int size = sizeof(u64); /* nr */
>  
> @@ -1256,7 +1267,16 @@ static inline void perf_sample_save_brstack(struct perf_sample_data *data,
>  		size += sizeof(u64);
>  	size += brs->nr * sizeof(struct perf_branch_entry);
>  
> +	/*
> +	 * The extension space is appended after the struct perf_branch_stack.
> +	 * It is used to store the extra data of each branch, e.g.,
> +	 * the occurrences of events since the last branch entry for Intel LBR.
> +	 */
> +	if (branch_sample_extra(event))
> +		size += brs->nr * sizeof(u64);
> +
>  	data->br_stack = brs;
> +	data->br_stack_ext = brs_ext;
>  	data->dyn_size += size;
>  	data->sample_flags |= PERF_SAMPLE_BRANCH_STACK;
>  }
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 39c6a250dd1b..252066579dae 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -204,6 +204,10 @@ enum perf_branch_sample_type_shift {
>  
>  	PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT	= 18, /* save privilege mode */
>  
> +	PERF_SAMPLE_BRANCH_EXTRA_SHIFT		= 19, /* support extra space */
> +
> +	PERF_SAMPLE_BRANCH_EVT_CNTRS_SHIFT	= 20, /* save occurrences of events on a branch */
> +
>  	PERF_SAMPLE_BRANCH_MAX_SHIFT		/* non-ABI */
>  };
>  
> @@ -235,6 +239,10 @@ enum perf_branch_sample_type {
>  
>  	PERF_SAMPLE_BRANCH_PRIV_SAVE	= 1U << PERF_SAMPLE_BRANCH_PRIV_SAVE_SHIFT,
>  
> +	PERF_SAMPLE_BRANCH_EXTRA	= 1U << PERF_SAMPLE_BRANCH_EXTRA_SHIFT,
> +
> +	PERF_SAMPLE_BRANCH_EVT_CNTRS	= 1U << PERF_SAMPLE_BRANCH_EVT_CNTRS_SHIFT,
> +
>  	PERF_SAMPLE_BRANCH_MAX		= 1U << PERF_SAMPLE_BRANCH_MAX_SHIFT,
>  };
>  
> @@ -982,6 +990,7 @@ enum perf_event_type {
>  	 *	{ u64                   nr;
>  	 *	  { u64	hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
>  	 *        { u64 from, to, flags } lbr[nr];
> +	 *        { u64 extra; } ext[nr] && PERF_SAMPLE_BRANCH_EXTRA
>  	 *      } && PERF_SAMPLE_BRANCH_STACK
>  	 *
>  	 * 	{ u64			abi; # enum perf_sample_regs_abi
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f84e2640ea2f..482e7efe365c 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -7324,6 +7324,14 @@ void perf_output_sample(struct perf_output_handle *handle,
>  			if (branch_sample_hw_index(event))
>  				perf_output_put(handle, data->br_stack->hw_idx);
>  			perf_output_copy(handle, data->br_stack->entries, size);
> +			/*
> +			 * Add the extension space which is appended
> +			 * right after the struct perf_branch_stack.
> +			 */
> +			if (branch_sample_extra(event) && data->br_stack_ext) {
> +				size = data->br_stack->nr * sizeof(u64);
> +				perf_output_copy(handle, data->br_stack_ext, size);
> +			}
>  		} else {
>  			/*
>  			 * we always store at least the value of nr

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-09-11 15:48 [RESEND PATCH V3 1/6] perf: Add branch stack extra kan.liang
                   ` (5 preceding siblings ...)
  2023-10-02 13:49 ` [RESEND PATCH V3 1/6] perf: Add branch stack extra Liang, Kan
@ 2023-10-02 15:45 ` Peter Zijlstra
  2023-10-02 19:19   ` Liang, Kan
  6 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2023-10-02 15:45 UTC (permalink / raw)
  To: kan.liang
  Cc: mingo, acme, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, namhyung, irogers, adrian.hunter, ak, eranian,
	alexey.v.bayduraev, tinghao.zhang, Sandipan Das, Ravi Bangoria,
	Athira Rajeev

On Mon, Sep 11, 2023 at 08:48:17AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Currently, the additional information of a branch entry is stored in a
> u64 space. With more and more information added, the space is running
> out. For example, the information of occurrences of events will be added
> for each branch.
> 
> Add a new branch sample type, PERF_SAMPLE_BRANCH_EXTRA, to indicate
> whether to support an extra space.
> 
> Two places were suggested to append the extra space.
> https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/
> One place is right after the flags of each branch entry. It changes the
> existing struct perf_branch_entry. In the later Intel-specific
> implementation, two separate spaces have to be created in the
> struct cpu_hw_events to store different branch entry structures. That
> duplicates space.

Well, something like so:

-       struct perf_branch_entry        lbr_entries[MAX_LBR_ENTRIES];
+
+       union {
+               struct perf_branch_entry        lbr_entries[MAX_LBR_ENTRIES];
+               struct perf_branch_entry_ext    lbr_entries_ext[MAX_LBR_ENTRIES];
+       };

would just do... you just have to be really careful to consistently pick
the right one.

Something that might help would be to do make perf_branch_stack::entries
a 'void *' and use:

struct perf_branch_entry_ext *
perf_get_branch_entry(struct perf_sample_data *data, int idx)
{
	if (data->sample_flags & PERF_SAMPLE_BRANCH_EXTRA)
		return (struct perf_branch_entry_ext *)data->br_stack->entries + idx;
	
	return (struct perf_branch_entry *)data->br_stack->entries + idx;
}

> The other place is right after the entire struct perf_branch_stack.
> Only adding the new extra space in the struct cpu_hw_event is necessary.
> The disadvantage is that the pointer of the extra space has to be
> recorded. The common interface perf_sample_save_brstack() has to be
> updated as well.

Right.. probably easier.

> The latter requires less space and is much straight forward. It is
> implemented in the patch.

Same amount of space either way around. 'n*x+n*y == n*(x+y)' and all that.

> Also, add a new branch sample type, PERF_SAMPLE_BRANCH_EVT_CNTRS, to
> indicate whether include occurrences of events in branch info. The
> information will be stored in the extra space.

This... why do we need two flags?

Also, I can't find this in the SDM, how wide are these counter deltas?
ISTR they're saturating, but not how wide they are.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-10-02 15:45 ` Peter Zijlstra
@ 2023-10-02 19:19   ` Liang, Kan
  2023-10-02 21:37     ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Liang, Kan @ 2023-10-02 19:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, acme, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, namhyung, irogers, adrian.hunter, ak, eranian,
	alexey.v.bayduraev, tinghao.zhang, Sandipan Das, Ravi Bangoria,
	Athira Rajeev



On 2023-10-02 11:45 a.m., Peter Zijlstra wrote:
> On Mon, Sep 11, 2023 at 08:48:17AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Currently, the additional information of a branch entry is stored in a
>> u64 space. With more and more information added, the space is running
>> out. For example, the information of occurrences of events will be added
>> for each branch.
>>
>> Add a new branch sample type, PERF_SAMPLE_BRANCH_EXTRA, to indicate
>> whether to support an extra space.
>>
>> Two places were suggested to append the extra space.
>> https://lore.kernel.org/lkml/20230802215814.GH231007@hirez.programming.kicks-ass.net/
>> One place is right after the flags of each branch entry. It changes the
>> existing struct perf_branch_entry. In the later Intel-specific
>> implementation, two separate spaces have to be created in the
>> struct cpu_hw_events to store different branch entry structures. That
>> duplicates space.
> 
> Well, something like so:
> 
> -       struct perf_branch_entry        lbr_entries[MAX_LBR_ENTRIES];
> +
> +       union {
> +               struct perf_branch_entry        lbr_entries[MAX_LBR_ENTRIES];
> +               struct perf_branch_entry_ext    lbr_entries_ext[MAX_LBR_ENTRIES];
> +       };
> 
> would just do... you just have to be really careful to consistently pick
> the right one.
> 
> Something that might help would be to do make perf_branch_stack::entries
> a 'void *' and use:
> 
> struct perf_branch_entry_ext *
> perf_get_branch_entry(struct perf_sample_data *data, int idx)
> {
> 	if (data->sample_flags & PERF_SAMPLE_BRANCH_EXTRA)
> 		return (struct perf_branch_entry_ext *)data->br_stack->entries + idx;
> 	
> 	return (struct perf_branch_entry *)data->br_stack->entries + idx;
> }

I tried to avoid the above extra calculation (although it should be
tiny), since it's in a NMI handler. So I once planned to add an extra
struct perf_branch_entry_ext    lbr_entries_ext[MAX_LBR_ENTRIES]; which
doubles the space.
But yes, it should be doable.

> 
>> The other place is right after the entire struct perf_branch_stack.
>> Only adding the new extra space in the struct cpu_hw_event is necessary.
>> The disadvantage is that the pointer of the extra space has to be
>> recorded. The common interface perf_sample_save_brstack() has to be
>> updated as well.
> 
> Right.. probably easier.

I don't see big drawbacks to it. Easier to understand and implement, so
should be easier to maintain as well.
I guess I will still use the latter, if no objection.

> 
>> The latter requires less space and is much straight forward. It is
>> implemented in the patch.
> 
> Same amount of space either way around. 'n*x+n*y == n*(x+y)' and all that.
> 
>> Also, add a new branch sample type, PERF_SAMPLE_BRANCH_EVT_CNTRS, to
>> indicate whether include occurrences of events in branch info. The
>> information will be stored in the extra space.
> 
> This... why do we need two flags?

Users may only collect the occurrences of some events in a group. The
EVT_CNTRS flag is used to indicate those events. E.g.,
perf record -e "{cpu/branch-instructions,branch_type=call/,
cpu/branch-misses,branch_type=event/}"

Only the occurrences of the branch-misses event is collected in LBR and
finally dumped into the extra buffer.

While the first flag, PERF_SAMPLE_BRANCH_EXTRA, only tells that the
extra space is required.

> 
> Also, I can't find this in the SDM, how wide are these counter deltas?
> ISTR they're saturating, but not how wide they are.

Now, it's documented in the Intel® Architecture Instruction Set
Extensions and Future Features, Chapter 8, 8.6 LBR ENHANCEMENTS. It
should be moved to SDM later.
https://cdrdv2.intel.com/v1/dl/getContent/671368

Only 2 bits for each counter. Saturating at a value of 3.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-10-02 19:19   ` Liang, Kan
@ 2023-10-02 21:37     ` Peter Zijlstra
  2023-10-03  0:57       ` Liang, Kan
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2023-10-02 21:37 UTC (permalink / raw)
  To: Liang, Kan
  Cc: mingo, acme, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, namhyung, irogers, adrian.hunter, ak, eranian,
	alexey.v.bayduraev, tinghao.zhang, Sandipan Das, Ravi Bangoria,
	Athira Rajeev

On Mon, Oct 02, 2023 at 03:19:04PM -0400, Liang, Kan wrote:

> >> Also, add a new branch sample type, PERF_SAMPLE_BRANCH_EVT_CNTRS, to
> >> indicate whether include occurrences of events in branch info. The
> >> information will be stored in the extra space.
> > 
> > This... why do we need two flags?
> 
> Users may only collect the occurrences of some events in a group. The
> EVT_CNTRS flag is used to indicate those events. E.g.,
> perf record -e "{cpu/branch-instructions,branch_type=call/,
> cpu/branch-misses,branch_type=event/}"
> 
> Only the occurrences of the branch-misses event is collected in LBR and
> finally dumped into the extra buffer.
> 
> While the first flag, PERF_SAMPLE_BRANCH_EXTRA, only tells that the
> extra space is required.

Or have it implicit, I reallt don't see the point of having two bits
here.

> > Also, I can't find this in the SDM, how wide are these counter deltas?
> > ISTR they're saturating, but not how wide they are.
> 
> Now, it's documented in the Intel® Architecture Instruction Set
> Extensions and Future Features, Chapter 8, 8.6 LBR ENHANCEMENTS. It
> should be moved to SDM later.
> https://cdrdv2.intel.com/v1/dl/getContent/671368
> 
> Only 2 bits for each counter. Saturating at a value of 3.

Urgh, this ISE document is shite, that thing don't say how many
IA32_LBR_INFO.PMCx_CNT fields there are, I think your later patch says
4, right? And is this for arch LBR or the other thing?

(Also, what is IA32_LER_x_INFO ?)

This is then a grant total of 8 bits.

And we still have 31 spare bits in perf_branch_entry.

Why again do we need the extra u64 ?!?

More specifically, this interface is pretty crap -- suppose the next
generation of things feels that 2 bits aint' enough and goes and gives
us 4. Then what do we do?

Did I already say that the ISE document raises more questions than it
provides answers?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-10-02 21:37     ` Peter Zijlstra
@ 2023-10-03  0:57       ` Liang, Kan
  2023-10-03 10:27         ` Peter Zijlstra
  0 siblings, 1 reply; 17+ messages in thread
From: Liang, Kan @ 2023-10-03  0:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, acme, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, namhyung, irogers, adrian.hunter, ak, eranian,
	alexey.v.bayduraev, tinghao.zhang, Sandipan Das, Ravi Bangoria,
	Athira Rajeev

On 2023-10-02 5:37 p.m., Peter Zijlstra wrote:
> On Mon, Oct 02, 2023 at 03:19:04PM -0400, Liang, Kan wrote:
> 
>>>> Also, add a new branch sample type, PERF_SAMPLE_BRANCH_EVT_CNTRS, to
>>>> indicate whether include occurrences of events in branch info. The
>>>> information will be stored in the extra space.
>>>
>>> This... why do we need two flags?
>>
>> Users may only collect the occurrences of some events in a group. The
>> EVT_CNTRS flag is used to indicate those events. E.g.,
>> perf record -e "{cpu/branch-instructions,branch_type=call/,
>> cpu/branch-misses,branch_type=event/}"
>>
>> Only the occurrences of the branch-misses event is collected in LBR and
>> finally dumped into the extra buffer.
>>
>> While the first flag, PERF_SAMPLE_BRANCH_EXTRA, only tells that the
>> extra space is required.
> 
> Or have it implicit, I reallt don't see the point of having two bits
> here.

Perf has to traverse the whole group to decide whether using the extra
space. But It should be possible to use an internal flag to avoid the
traverse every time. Let me have a try.

> 
>>> Also, I can't find this in the SDM, how wide are these counter deltas?
>>> ISTR they're saturating, but not how wide they are.
>>
>> Now, it's documented in the Intel® Architecture Instruction Set
>> Extensions and Future Features, Chapter 8, 8.6 LBR ENHANCEMENTS. It
>> should be moved to SDM later.
>> https://cdrdv2.intel.com/v1/dl/getContent/671368
>>
>> Only 2 bits for each counter. Saturating at a value of 3.
> 
> Urgh, this ISE document is shite, that thing don't say how many
> IA32_LBR_INFO.PMCx_CNT fields there are, I think your later patch says
> 4, right? And is this for arch LBR or the other thing?
>

It's for Arch LBR. Yes, the current CPUID enumeration implies that only
4 counters.

"Per-counter support for LBR Event Logging is indicated by the “Event
Logging Supported” bitmap in CPUID.(EAX=01CH, ECX=0).ECX[19:16]"

> (Also, what is IA32_LER_x_INFO ?)

Last Event Record (LER). It records the last taken branch preceding the
last exception, hardware interrupt, or software interrupt.
Linux doesn't have it supported.

> 
> This is then a grant total of 8 bits.
> 
> And we still have 31 spare bits in perf_branch_entry.
> 
> Why again do we need the extra u64 ?!?
>

The first version utilizes the spare bits in perf_branch_entry.
https://lore.kernel.org/lkml/20230414145324.GB761523@hirez.programming.kicks-ass.net/

To address the similar concern (what should we do if more counters and a
wider bits are added later), I changed it to the extra space method
since V2.

Another consideration is that the 'events' field in the
perf_branch_entry from V1 is Intel specific. The u64 extra space is more
generic. Other ARCHs can utilize it to store other extra information if
they want.

Please let me know if I'm overthinking. I can switch back to the
'events' field of V1.

> More specifically, this interface is pretty crap -- suppose the next
> generation of things feels that 2 bits aint' enough and goes and gives
> us 4. Then what do we do?
>

The current LBR is an architectural feature. The existed fields of 2
bits 4 counters should not be changed.
But yes, it's possible to add more bits and counters into the reserved
bits. The reserved bits of the IA32_LBR_x_INFO are only 31 now. The u64
extra space should be good enough.
If more information is introduced later (e.g., a brand new
LBR_x_INFO_2), then we can add a extra_2 space.

But I don't see there is a plan to extend the IA32_LBR_x_INFO again in
the near future.

> Did I already say that the ISE document raises more questions than it
> provides answers?

Yes. Would an improved CPUID enumeration can address the questions? For
example, the CPUID enumeration can give the maximum number of counters
and supported width? I think we can discuss it with the architect.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-10-03  0:57       ` Liang, Kan
@ 2023-10-03 10:27         ` Peter Zijlstra
  2023-10-03 12:57           ` Liang, Kan
  2023-10-03 15:06           ` Andi Kleen
  0 siblings, 2 replies; 17+ messages in thread
From: Peter Zijlstra @ 2023-10-03 10:27 UTC (permalink / raw)
  To: Liang, Kan
  Cc: mingo, acme, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, namhyung, irogers, adrian.hunter, ak, eranian,
	alexey.v.bayduraev, tinghao.zhang, Sandipan Das, Ravi Bangoria,
	Athira Rajeev

On Mon, Oct 02, 2023 at 08:57:57PM -0400, Liang, Kan wrote:

> > Did I already say that the ISE document raises more questions than it
> > provides answers?
> 
> Yes. Would an improved CPUID enumeration can address the questions? For
> example, the CPUID enumeration can give the maximum number of counters
> and supported width? I think we can discuss it with the architect.

So.. no. Suppose another arch goes and does the same, but with a
different number and width of counters. They won't have CPUID.

I'm thinking we should do something like expose branch_counter_nr and
branch_counter_width in the sysfs node, and then rename this extra field
to counters.

Then userspace can do something like:

	for (i = 0; i < branch_counter_nr; i++) {
		counter[i] = counters & ((1 << branch_counter_width) - 1);
		counters >>= branch_counter_width;
	}

to extract the actual counter values.

So then we end up with:

         *      { u64                   nr;
         *        { u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
         *        { u64 from, to, flags } lbr[nr];
+        *        { u64 counters; } cntr[nr] && PERF_SAMPLE_BRANCH_COUNTERS
         *      } && PERF_SAMPLE_BRANCH_STACK

Have it explicitly named counters, have only the one flag and have sysfs
files describe how to decode it.

Then for this Intel thing we have 4 counters of 2 bits, but if someone
else were to do something different (both Power and ARM64 have this
branch stack stuff now) they can describe it.

It is a bit wasteful on bits... but at least its clear I suppose.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-10-03 10:27         ` Peter Zijlstra
@ 2023-10-03 12:57           ` Liang, Kan
  2023-10-03 15:06           ` Andi Kleen
  1 sibling, 0 replies; 17+ messages in thread
From: Liang, Kan @ 2023-10-03 12:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: mingo, acme, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, namhyung, irogers, adrian.hunter, ak, eranian,
	alexey.v.bayduraev, tinghao.zhang, Sandipan Das, Ravi Bangoria,
	Athira Rajeev



On 2023-10-03 6:27 a.m., Peter Zijlstra wrote:
> On Mon, Oct 02, 2023 at 08:57:57PM -0400, Liang, Kan wrote:
> 
>>> Did I already say that the ISE document raises more questions than it
>>> provides answers?
>>
>> Yes. Would an improved CPUID enumeration can address the questions? For
>> example, the CPUID enumeration can give the maximum number of counters
>> and supported width? I think we can discuss it with the architect.
> 
> So.. no. Suppose another arch goes and does the same, but with a
> different number and width of counters. They won't have CPUID.
> 
> I'm thinking we should do something like expose branch_counter_nr and
> branch_counter_width in the sysfs node, and then rename this extra field
> to counters.

Sure. I will expose them under the "caps" folder.

> 
> Then userspace can do something like:
> 
> 	for (i = 0; i < branch_counter_nr; i++) {
> 		counter[i] = counters & ((1 << branch_counter_width) - 1);
> 		counters >>= branch_counter_width;
> 	}
> 
> to extract the actual counter values.
> 
> 
> So then we end up with:
> 
>          *      { u64                   nr;
>          *        { u64 hw_idx; } && PERF_SAMPLE_BRANCH_HW_INDEX
>          *        { u64 from, to, flags } lbr[nr];
> +        *        { u64 counters; } cntr[nr] && PERF_SAMPLE_BRANCH_COUNTERS
>          *      } && PERF_SAMPLE_BRANCH_STACK
> 
> Have it explicitly named counters, have only the one flag and have sysfs
> files describe how to decode it.
> 
> Then for this Intel thing we have 4 counters of 2 bits, but if someone
> else were to do something different (both Power and ARM64 have this
> branch stack stuff now) they can describe it.
> 
> It is a bit wasteful on bits... but at least its clear I suppose.

Yes. I will send a V4 with the above suggestions.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-10-03 10:27         ` Peter Zijlstra
  2023-10-03 12:57           ` Liang, Kan
@ 2023-10-03 15:06           ` Andi Kleen
  2023-10-03 15:58             ` Liang, Kan
  2023-10-03 16:33             ` Peter Zijlstra
  1 sibling, 2 replies; 17+ messages in thread
From: Andi Kleen @ 2023-10-03 15:06 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liang, Kan, mingo, acme, linux-kernel, mark.rutland,
	alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter,
	eranian, alexey.v.bayduraev, tinghao.zhang, Sandipan Das,
	Ravi Bangoria, Athira Rajeev

> I'm thinking we should do something like expose branch_counter_nr and
> branch_counter_width in the sysfs node, and then rename this extra field
> to counters.
> 
> Then userspace can do something like:
> 
> 	for (i = 0; i < branch_counter_nr; i++) {
> 		counter[i] = counters & ((1 << branch_counter_width) - 1);
> 		counters >>= branch_counter_width;
> 	}
> 
> to extract the actual counter values.

perf script/report won't necessarily have access to the sysfs
values if they run on a different system

It would need extra PT style metadata written by perf record to
perf.data and read by the user tools.

Seems complicated. It would be better if it just parsed on its own.

-Andi


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-10-03 15:06           ` Andi Kleen
@ 2023-10-03 15:58             ` Liang, Kan
  2023-10-03 16:33             ` Peter Zijlstra
  1 sibling, 0 replies; 17+ messages in thread
From: Liang, Kan @ 2023-10-03 15:58 UTC (permalink / raw)
  To: Andi Kleen, Peter Zijlstra
  Cc: mingo, acme, linux-kernel, mark.rutland, alexander.shishkin,
	jolsa, namhyung, irogers, adrian.hunter, eranian,
	alexey.v.bayduraev, tinghao.zhang, Sandipan Das, Ravi Bangoria,
	Athira Rajeev



On 2023-10-03 11:06 a.m., Andi Kleen wrote:
>> I'm thinking we should do something like expose branch_counter_nr and
>> branch_counter_width in the sysfs node, and then rename this extra field
>> to counters.
>>
>> Then userspace can do something like:
>>
>> 	for (i = 0; i < branch_counter_nr; i++) {
>> 		counter[i] = counters & ((1 << branch_counter_width) - 1);
>> 		counters >>= branch_counter_width;
>> 	}
>>
>> to extract the actual counter values.
> 
> perf script/report won't necessarily have access to the sysfs
> values if they run on a different system
> 
> It would need extra PT style metadata written by perf record to
> perf.data and read by the user tools.
> 
> Seems complicated. It would be better if it just parsed on its own.
> 

If so, perf probably have to save the information in the
perf_branch_entry. At least, perf has to be able to support 64 counters
and 64 width. That will use at least 12 bits of the info field of the
perf_branch_entry. (We only have 31 spare bits now. Almost half.) That
seems a waste.

Perf tool already supports "branches" in the caps folder. It has been
saved in the header. We can use a similar way to support
"branch_counter_nr" and "branch_counter_width". It doesn't seem very
complex. I think I will try the sysfs method first.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-10-03 15:06           ` Andi Kleen
  2023-10-03 15:58             ` Liang, Kan
@ 2023-10-03 16:33             ` Peter Zijlstra
  2023-10-03 16:55               ` Andi Kleen
  1 sibling, 1 reply; 17+ messages in thread
From: Peter Zijlstra @ 2023-10-03 16:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Liang, Kan, mingo, acme, linux-kernel, mark.rutland,
	alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter,
	eranian, alexey.v.bayduraev, tinghao.zhang, Sandipan Das,
	Ravi Bangoria, Athira Rajeev

On Tue, Oct 03, 2023 at 08:06:59AM -0700, Andi Kleen wrote:
> > I'm thinking we should do something like expose branch_counter_nr and
> > branch_counter_width in the sysfs node, and then rename this extra field
> > to counters.
> > 
> > Then userspace can do something like:
> > 
> > 	for (i = 0; i < branch_counter_nr; i++) {
> > 		counter[i] = counters & ((1 << branch_counter_width) - 1);
> > 		counters >>= branch_counter_width;
> > 	}
> > 
> > to extract the actual counter values.
> 
> perf script/report won't necessarily have access to the sysfs
> values if they run on a different system
> 
> It would need extra PT style metadata written by perf record to
> perf.data and read by the user tools.
> 
> Seems complicated. It would be better if it just parsed on its own.

Well, you really don't want to repeat the 4,2 thing in every event, that
seems daft (and a waste of space, because how large do we want those
fields to be etc..).

We don't really have a better place for these sorts of things. And yeah,
the PT thing sets precedent here.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RESEND PATCH V3 1/6] perf: Add branch stack extra
  2023-10-03 16:33             ` Peter Zijlstra
@ 2023-10-03 16:55               ` Andi Kleen
  0 siblings, 0 replies; 17+ messages in thread
From: Andi Kleen @ 2023-10-03 16:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Liang, Kan, mingo, acme, linux-kernel, mark.rutland,
	alexander.shishkin, jolsa, namhyung, irogers, adrian.hunter,
	eranian, alexey.v.bayduraev, tinghao.zhang, Sandipan Das,
	Ravi Bangoria, Athira Rajeev

On Tue, Oct 03, 2023 at 06:33:00PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 03, 2023 at 08:06:59AM -0700, Andi Kleen wrote:
> > > I'm thinking we should do something like expose branch_counter_nr and
> > > branch_counter_width in the sysfs node, and then rename this extra field
> > > to counters.
> > > 
> > > Then userspace can do something like:
> > > 
> > > 	for (i = 0; i < branch_counter_nr; i++) {
> > > 		counter[i] = counters & ((1 << branch_counter_width) - 1);
> > > 		counters >>= branch_counter_width;
> > > 	}
> > > 
> > > to extract the actual counter values.
> > 
> > perf script/report won't necessarily have access to the sysfs
> > values if they run on a different system
> > 
> > It would need extra PT style metadata written by perf record to
> > perf.data and read by the user tools.
> > 
> > Seems complicated. It would be better if it just parsed on its own.
> 
> Well, you really don't want to repeat the 4,2 thing in every event, that
> seems daft (and a waste of space, because how large do we want those
> fields to be etc..).

It's just a few bits? It could be an extra 16bit field or so per event.

There are probably other self describing encodings for the numbers
(e.g. some variant of LEB128 on a sub byte level), but that might be more
expensive to store it.

What would worry me is that various users would just hard code and then
fail later. There are lots of non perf tools perf.data parsers around
these days.

-Andi


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-10-03 16:56 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-11 15:48 [RESEND PATCH V3 1/6] perf: Add branch stack extra kan.liang
2023-09-11 15:48 ` [RESEND PATCH V3 2/6] perf/x86: Add PERF_X86_EVENT_NEEDS_BRANCH_STACK flag kan.liang
2023-09-11 15:48 ` [RESEND PATCH V3 3/6] perf: Add branch_sample_call_stack kan.liang
2023-09-11 15:48 ` [RESEND PATCH V3 4/6] perf/x86/intel: Support LBR event logging kan.liang
2023-09-11 15:48 ` [RESEND PATCH V3 5/6] tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel kan.liang
2023-09-11 15:48 ` [RESEND PATCH V3 6/6] perf tools: Add branch event knob kan.liang
2023-10-02 13:49 ` [RESEND PATCH V3 1/6] perf: Add branch stack extra Liang, Kan
2023-10-02 15:45 ` Peter Zijlstra
2023-10-02 19:19   ` Liang, Kan
2023-10-02 21:37     ` Peter Zijlstra
2023-10-03  0:57       ` Liang, Kan
2023-10-03 10:27         ` Peter Zijlstra
2023-10-03 12:57           ` Liang, Kan
2023-10-03 15:06           ` Andi Kleen
2023-10-03 15:58             ` Liang, Kan
2023-10-03 16:33             ` Peter Zijlstra
2023-10-03 16:55               ` Andi Kleen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox