linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/9] Support branch counters in block annotation
@ 2024-08-13 16:01 kan.liang
  2024-08-13 16:02 ` [PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error kan.liang
                   ` (9 more replies)
  0 siblings, 10 replies; 13+ messages in thread
From: kan.liang @ 2024-08-13 16:01 UTC (permalink / raw)
  To: acme, namhyung, irogers, peterz, mingo, linux-kernel,
	linux-perf-users
  Cc: adrian.hunter, ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Changes since V2:
- Rebase on top of the tmp.perf-tools-next commit 9da782071202
  ("perf test: Add test for Intel TPEBS")
- Add Acked-by from Namhyung and Reviewed-by from Andi

Changes since V1:
- Add Acked-by from Namhyung for patch 1 and 2.
- Use array to replace pointer to store the abbr name. (Patch 5)
  (Namhyung)
- Add -v to print the value like "A=<n>,B=<n>", rather than histogram.
  (Patch 6) (Namhyung)

The branch counters logging (A.K.A LBR event logging) introduces a
per-counter indication of precise event occurrences in LBRs. It can
provide a means to attribute exposed retirement latency to combinations
of events across a block of instructions. It also provides a means of
attributing Timed LBR latencies to events.

The kernel support and basic perf tool support have been merged.
https://lore.kernel.org/lkml/20231025201626.3000228-1-kan.liang@linux.intel.com/

This series is to provide advanced perf tool support via adding the
branch counters information in block annotation. It can further
facilitate the analysis of branch blocks.

The patch 1 and 2 are to fix two existing issues of --total-cycles and
the branch counters feature.

The patch 3-9 are the advanced perf tool support.

Here are some examples.

perf annotation:

$perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
$perf report  --total-cycles --stdio

 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
 # Event count (approx.): 1610046
 #
 # Branch counter abbr list:
 # branch-instructions:ppp = A
 # branch-misses = B
 # '-' No event occurs
 # '+' Event occurrences may be lost due to branch counter saturated
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles          Branch Counter [Program Block Range]
 # ...............  ..............  ...........  ..........  ......................  ..
 #
           57.55%            2.5M        0.00%           3             |A   |-   |                 ...
           25.27%            1.1M        0.00%           2             |AA  |-   |                 ...
           15.61%          667.2K        0.00%           1             |A   |-   |                 ...
            0.16%            6.9K        0.81%         575             |A   |-   |                 ...
            0.16%            6.8K        1.38%         977             |AA  |-   |                 ...
            0.16%            6.8K        0.04%          28             |AA  |B   |                 ...
            0.15%            6.6K        1.33%         946             |A   |-   |                 ...
            0.11%            4.5K        0.06%          46             |AAA+|-   |                 ...

With -v applied,
$perf report  --total-cycles --stdio -v

 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles          Branch Counter [Program Block Range]
 # ...............  ..............  ...........  ..........  ......................  ..................
 #
           57.55%            2.5M        0.00%           3               A=1 ,B=-                  ...
           25.27%            1.1M        0.00%           2               A=2 ,B=-                  ...
           15.61%          667.2K        0.00%           1               A=1 ,B=-                  ...
            0.16%            6.9K        0.81%         575               A=1 ,B=-                  ...
            0.16%            6.8K        1.38%         977               A=2 ,B=-                  ...
            0.16%            6.8K        0.04%          28               A=2 ,B=1                  ...
            0.15%            6.6K        1.33%         946               A=1 ,B=-                  ...
            0.11%            4.5K        0.06%          46               A=3+,B=-                  ...

(The below output is in the TUI mode. Users can press 'B' to display
the Branch counter abbr list.)

Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3  /home/sdp/test/tchain_edit [Percent: local period]
Percent       │ IPC Cycle       Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
              │                                     0000000000401755 <f3>:
  0.00   0.00 │                                       endbr64
              │                                       push    %rbp
              │                                       mov     %rsp,%rbp
              │                                       movl    $0x0,-0x4(%rbp)
  0.00   0.00 │1.33     3          |A   |-   |      ↓ jmp     25
 11.03  11.03 │                                 11:   mov     -0x4(%rbp),%eax
              │                                       and     $0x1,%eax
              │                                       test    %eax,%eax
 17.13  17.13 │2.41     1          |A   |-   |      ↓ je      21
              │                                       addl    $0x1,-0x4(%rbp)
 21.84  21.84 │2.22     2          |AA  |-   |      ↓ jmp     25
 17.13  17.13 │                                 21:   addl    $0x1,-0x4(%rbp)
 21.84  21.84 │                                 25:   cmpl    $0x270f,-0x4(%rbp)
 11.03  11.03 │0.61     3          |A   |-   |      ↑ jle     11
              │                                       nop
              │                                       pop     %rbp
  0.00   0.00 │0.24    20          |AA  |B   |      ← ret

perf script:

$perf script -F +brstackinsn,+brcntr

 # Branch counter abbr list:
 # branch-instructions:ppp = A
 # branch-misses = B
 # '-' No event occurs
 # '+' Event occurrences may be lost due to branch counter saturated
     tchain_edit  332203 3366329.405674:      53030 branch-instructions:ppp:            401781 f3+0x2c (home/sdp/test/tchain_edit)
        f3+31:
        0000000000401774        insn: eb 04                     br_cntr: AA     # PRED 5 cycles [5]
        000000000040177a        insn: 81 7d fc 0f 27 00 00
        0000000000401781        insn: 7e e3                     br_cntr: A      # PRED 1 cycles [6] 2.00 IPC
        0000000000401766        insn: 8b 45 fc
        0000000000401769        insn: 83 e0 01
        000000000040176c        insn: 85 c0

$perf script -F +brstackinsn,+brcntr -v

     tchain_edit  332203 3366329.405674:      53030 branch-instructions:ppp:            401781 f3+0x2c (/home/sdp/test/tchain_edit)
        f3+31:
        0000000000401774        insn: eb 04                     br_cntr: branch-instructions:ppp 2 branch-misses 0      # PRED 5 cycles [5]
        000000000040177a        insn: 81 7d fc 0f 27 00 00
        0000000000401781        insn: 7e e3                     br_cntr: branch-instructions:ppp 1 branch-misses 0      # PRED 1 cycles [6] 2.00 IPC
        0000000000401766        insn: 8b 45 fc
        0000000000401769        insn: 83 e0 01
        000000000040176c        insn: 85 c0


Kan Liang (9):
  perf report: Fix --total-cycles --stdio output error
  perf report: Remove the first overflow check for branch counters
  perf evlist: Save branch counters information
  perf annotate: Save branch counters for each block
  perf evsel: Assign abbr name for the branch counter events
  perf report: Display the branch counter histogram
  perf annotate: Display the branch counter histogram
  perf script: Add branch counters
  perf test: Add new test cases for the branch counter feature

 tools/perf/Documentation/perf-report.txt |   1 +
 tools/perf/Documentation/perf-script.txt |   2 +-
 tools/perf/builtin-annotate.c            |  13 +-
 tools/perf/builtin-diff.c                |   8 +-
 tools/perf/builtin-report.c              |  25 ++-
 tools/perf/builtin-script.c              |  69 ++++++-
 tools/perf/builtin-top.c                 |   4 +-
 tools/perf/tests/shell/record.sh         |  17 +-
 tools/perf/ui/browsers/annotate.c        |  18 +-
 tools/perf/ui/browsers/hists.c           |  18 +-
 tools/perf/util/annotate.c               | 253 +++++++++++++++++++++--
 tools/perf/util/annotate.h               |  24 ++-
 tools/perf/util/block-info.c             |  66 +++++-
 tools/perf/util/block-info.h             |   8 +-
 tools/perf/util/branch.h                 |   1 +
 tools/perf/util/disasm.c                 |   1 +
 tools/perf/util/evlist.c                 |  68 ++++++
 tools/perf/util/evlist.h                 |   2 +
 tools/perf/util/evsel.c                  |  15 +-
 tools/perf/util/evsel.h                  |  14 ++
 tools/perf/util/hist.c                   |   5 +-
 tools/perf/util/hist.h                   |   2 +-
 tools/perf/util/machine.c                |   3 +
 23 files changed, 567 insertions(+), 70 deletions(-)

-- 
2.38.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error
  2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
@ 2024-08-13 16:02 ` kan.liang
  2024-08-13 18:44   ` Arnaldo Carvalho de Melo
  2024-08-13 16:02 ` [PATCH V3 2/9] perf report: Remove the first overflow check for branch counters kan.liang
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 13+ messages in thread
From: kan.liang @ 2024-08-13 16:02 UTC (permalink / raw)
  To: acme, namhyung, irogers, peterz, mingo, linux-kernel,
	linux-perf-users
  Cc: adrian.hunter, ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The --total-cycles may output wrong information with the --stdio.

For example,
  perf record -e "{cycles,instructions}",cache-misses -b sleep 1
  perf report --total-cycles --stdio

The total cycles output of {cycles,instructions} and cache-misses are
almost the same.

 # Samples: 938  of events 'anon group { cycles, instructions }'
 # Event count (approx.): 938
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
 # ...............  ..............  ...........  ..........
  ..................................................>
 #
           11.19%            2.6K        0.10%          21
                          [perf_iterate_ctx+48 -> >
            5.79%            1.4K        0.45%          97
            [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
            5.11%            1.2K        0.33%          71
                             [native_write_msr+0 ->>

 # Samples: 293  of event 'cache-misses'
 # Event count (approx.): 293
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
                                                  [>
 # ...............  ..............  ...........  ..........
   ..................................................>
 #
           11.19%            2.6K        0.13%          21
                          [perf_iterate_ctx+48 -> >
            5.79%            1.4K        0.59%          97
[__intel_pmu_enable_all.constprop.0+80 -> __intel_>
            5.11%            1.2K        0.43%          71
                             [native_write_msr+0 ->>

With the symbol_conf.event_group, the perf report should only report the
block information of the leader event in a group.
However, the current implementation retrieves the next event's block
information, rather than the next group leader's block information.

Make sure the index is updated even if the event is skipped.

With the patch,

 # Samples: 293  of event 'cache-misses'
 # Event count (approx.): 293
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
                                                  [>
 # ...............  ..............  ...........  ..........
   ..................................................>
 #
           37.98%            9.0K        4.05%         299
   [perf_event_addr_filters_exec+0 -> perf_event_a>
           11.19%            2.6K        0.28%          21
                          [perf_iterate_ctx+48 -> >
            5.79%            1.4K        1.32%          97
[__intel_pmu_enable_all.constprop.0+80 -> __intel_>

Fixes: 6f7164fa231a ("perf report: Sort by sampled cycles percent per block for stdio")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-report.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index dfb47fa85e5c..04b9a5c1bc7e 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -565,6 +565,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
 		struct hists *hists = evsel__hists(pos);
 		const char *evname = evsel__name(pos);
 
+		i++;
 		if (symbol_conf.event_group && !evsel__is_group_leader(pos))
 			continue;
 
@@ -574,7 +575,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
 		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
 
 		if (rep->total_cycles_mode) {
-			report__browse_block_hists(&rep->block_reports[i++].hist,
+			report__browse_block_hists(&rep->block_reports[i - 1].hist,
 						   rep->min_percent, pos, NULL);
 			continue;
 		}
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V3 2/9] perf report: Remove the first overflow check for branch counters
  2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
  2024-08-13 16:02 ` [PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error kan.liang
@ 2024-08-13 16:02 ` kan.liang
  2024-08-13 16:02 ` [PATCH V3 3/9] perf evlist: Save branch counters information kan.liang
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: kan.liang @ 2024-08-13 16:02 UTC (permalink / raw)
  To: acme, namhyung, irogers, peterz, mingo, linux-kernel,
	linux-perf-users
  Cc: adrian.hunter, ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

A false overflow warning is triggered if a sample doesn't have any LBRs
recorded and the branch counters feature is enabled.

The current code does OVERFLOW_CHECK_u64() at the very beginning when
reading the information of branch counters. It assumes that there is at
least one LBR in the PEBS record. But it is a valid case that 0 LBR is
recorded especially in a high context switch.

Remove the OVERFLOW_CHECK_u64(). The later OVERFLOW_CHECK() should be
good enough to check the overflow when reading the information of the
branch counters.

Fixes: 9fbb4b02302b ("perf tools: Add branch counter knob")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/evsel.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index d607056b73c9..f22f402d54cc 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2884,8 +2884,6 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
 		array = (void *)array + sz;
 
 		if (evsel__has_branch_counters(evsel)) {
-			OVERFLOW_CHECK_u64(array);
-
 			data->branch_stack_cntr = (u64 *)array;
 			sz = data->branch_stack->nr * sizeof(u64);
 
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V3 3/9] perf evlist: Save branch counters information
  2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
  2024-08-13 16:02 ` [PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error kan.liang
  2024-08-13 16:02 ` [PATCH V3 2/9] perf report: Remove the first overflow check for branch counters kan.liang
@ 2024-08-13 16:02 ` kan.liang
  2024-08-13 16:02 ` [PATCH V3 4/9] perf annotate: Save branch counters for each block kan.liang
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: kan.liang @ 2024-08-13 16:02 UTC (permalink / raw)
  To: acme, namhyung, irogers, peterz, mingo, linux-kernel,
	linux-perf-users
  Cc: adrian.hunter, ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The branch counters logging (A.K.A LBR event logging) introduces a
per-counter indication of precise event occurrences in LBRs. The kernel
only dumps the number of occurrences into a record. The perf tool has
to map the number to the corresponding event.

Add evlist__update_br_cntr() to go through the evlist to pick the
events that are configured to be logged. Assign a logical idx to track
them, and add the total number of the events in the leader event.
The total number will be used to allocate the space to save the branch
counters for a block. The logical idx will be used to locate the
corresponding event quickly in the following patches.

It only needs to iterate the evlist once. The
evsel__has_branch_counters() is also optimized.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/evlist.c | 15 +++++++++++++++
 tools/perf/util/evlist.h |  2 ++
 tools/perf/util/evsel.c  | 13 +++++++------
 tools/perf/util/evsel.h  |  8 ++++++++
 4 files changed, 32 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 4b2538e4f679..68bbd3ea771b 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -79,6 +79,7 @@ void evlist__init(struct evlist *evlist, struct perf_cpu_map *cpus,
 	evlist->ctl_fd.fd = -1;
 	evlist->ctl_fd.ack = -1;
 	evlist->ctl_fd.pos = -1;
+	evlist->nr_br_cntr = -1;
 }
 
 struct evlist *evlist__new(void)
@@ -1264,6 +1265,20 @@ u64 evlist__combined_branch_type(struct evlist *evlist)
 	return branch_type;
 }
 
+void evlist__update_br_cntr(struct evlist *evlist)
+{
+	struct evsel *evsel;
+	int i = 0;
+
+	evlist__for_each_entry(evlist, evsel) {
+		if (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) {
+			evsel->br_cntr_idx = i++;
+			evsel__leader(evsel)->br_cntr_nr++;
+		}
+	}
+	evlist->nr_br_cntr = i;
+}
+
 bool evlist__valid_read_format(struct evlist *evlist)
 {
 	struct evsel *first = evlist__first(evlist), *pos = first;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index cccc34da5a02..b46f1a320783 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -57,6 +57,7 @@ struct evlist {
 	bool		 enabled;
 	int		 id_pos;
 	int		 is_pos;
+	int		 nr_br_cntr;
 	u64		 combined_sample_type;
 	enum bkw_mmap_state bkw_mmap_state;
 	struct {
@@ -219,6 +220,7 @@ int evlist__apply_filters(struct evlist *evlist, struct evsel **err_evsel,
 u64 __evlist__combined_sample_type(struct evlist *evlist);
 u64 evlist__combined_sample_type(struct evlist *evlist);
 u64 evlist__combined_branch_type(struct evlist *evlist);
+void evlist__update_br_cntr(struct evlist *evlist);
 bool evlist__sample_id_all(struct evlist *evlist);
 u16 evlist__id_hdr_size(struct evlist *evlist);
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index f22f402d54cc..38a74d6dde49 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2636,17 +2636,18 @@ u64 evsel__bitfield_swap_branch_flags(u64 value)
 
 static inline bool evsel__has_branch_counters(const struct evsel *evsel)
 {
-	struct evsel *cur, *leader = evsel__leader(evsel);
+	struct evsel *leader = evsel__leader(evsel);
 
 	/* The branch counters feature only supports group */
 	if (!leader || !evsel->evlist)
 		return false;
 
-	evlist__for_each_entry(evsel->evlist, cur) {
-		if ((leader == evsel__leader(cur)) &&
-		    (cur->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS))
-			return true;
-	}
+	if (evsel->evlist->nr_br_cntr < 0)
+		evlist__update_br_cntr(evsel->evlist);
+
+	if (leader->br_cntr_nr > 0)
+		return true;
+
 	return false;
 }
 
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index a5da4b03bb1c..a393dae1dc96 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -148,6 +148,14 @@ struct evsel {
 	 */
 	__u64			synth_sample_type;
 
+	/*
+	 * Store the branch counter related information.
+	 * br_cntr_idx: The idx of the branch counter event in the evlist
+	 * br_cntr_nr:  The number of the branch counter event in the group
+	 *	        (Only available for the leader event)
+	 */
+	int			br_cntr_idx;
+	int			br_cntr_nr;
 	/*
 	 * bpf_counter_ops serves two use cases:
 	 *   1. perf-stat -b          counting events used byBPF programs
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V3 4/9] perf annotate: Save branch counters for each block
  2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
                   ` (2 preceding siblings ...)
  2024-08-13 16:02 ` [PATCH V3 3/9] perf evlist: Save branch counters information kan.liang
@ 2024-08-13 16:02 ` kan.liang
  2024-08-13 16:02 ` [PATCH V3 5/9] perf evsel: Assign abbr name for the branch counter events kan.liang
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: kan.liang @ 2024-08-13 16:02 UTC (permalink / raw)
  To: acme, namhyung, irogers, peterz, mingo, linux-kernel,
	linux-perf-users
  Cc: adrian.hunter, ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

When annotating a basic block, it's useful to display the occurrences
of other events in the block.

The branch counter feature is only available for newer Intel platforms.
So a dedicated option to display the branch counters is not introduced.
Reuse the existing --total-cycles option, which triggers the annotation
of a basic block and displays the cycle-related annotation. When the
branch counters information is available, the branch counters are
automatically appended after all the cycle-related annotation.
Accounting the branch counters as well when accounting the cycles in
hist__account_cycles().

In struct annotated_branch, introduce a br_cntr array to save the
accumulation of each branch counter.
In a sample, all the branch counters for a branch are saved in a u64
space. Because the saturation of a branch counter is small, e.g., for
Intel Sierra Forest, the saturation is only 3. Add
ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter
once saturated. That can be used to indicate a potential event lost
because of the saturation.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-annotate.c |  3 +-
 tools/perf/builtin-diff.c     |  4 +--
 tools/perf/builtin-report.c   |  2 +-
 tools/perf/builtin-top.c      |  4 +--
 tools/perf/util/annotate.c    | 68 ++++++++++++++++++++++++++++-------
 tools/perf/util/annotate.h    | 10 +++++-
 tools/perf/util/branch.h      |  1 +
 tools/perf/util/hist.c        |  5 +--
 tools/perf/util/hist.h        |  2 +-
 tools/perf/util/machine.c     |  3 ++
 10 files changed, 80 insertions(+), 22 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index d6f6ba5a569d..9b4a8a379b5b 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -221,7 +221,8 @@ static int process_branch_callback(struct evsel *evsel,
 	if (a.map != NULL)
 		dso__set_hit(map__dso(a.map));
 
-	hist__account_cycles(sample->branch_stack, al, sample, false, NULL);
+	hist__account_cycles(sample->branch_stack, al, sample, false,
+			     NULL, evsel);
 
 	ret = hist_entry_iter__add(&iter, &a, PERF_MAX_STACK_DEPTH, ann);
 out:
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 28c5208fcdc9..6983b5373372 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -431,8 +431,8 @@ static int diff__process_sample_event(const struct perf_tool *tool,
 			goto out;
 		}
 
-		hist__account_cycles(sample->branch_stack, &al, sample, false,
-				     NULL);
+		hist__account_cycles(sample->branch_stack, &al, sample,
+				     false, NULL, evsel);
 		break;
 
 	case COMPUTE_STREAM:
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 04b9a5c1bc7e..ab450644e5b3 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -328,7 +328,7 @@ static int process_sample_event(const struct perf_tool *tool,
 	if (ui__has_annotation() || rep->symbol_ipc || rep->total_cycles_mode) {
 		hist__account_cycles(sample->branch_stack, &al, sample,
 				     rep->nonany_branch_mode,
-				     &rep->total_cycles);
+				     &rep->total_cycles, evsel);
 	}
 
 	ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep);
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 1cdac4e190e9..881b861c35ee 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -735,8 +735,8 @@ static int hist_iter__top_callback(struct hist_entry_iter *iter,
 		perf_top__record_precise_ip(top, iter->he, iter->sample, evsel, al->addr);
 
 	hist__account_cycles(iter->sample->branch_stack, al, iter->sample,
-		     !(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY),
-		     NULL);
+			     !(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY),
+			     NULL, evsel);
 	return 0;
 }
 
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index a87d2e97e10d..c269758d47c8 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -266,22 +266,30 @@ struct annotated_branch *annotation__get_branch(struct annotation *notes)
 	return notes->branch;
 }
 
-static struct cyc_hist *symbol__cycles_hist(struct symbol *sym)
+static struct annotated_branch *symbol__find_branch_hist(struct symbol *sym,
+							 unsigned int br_cntr_nr)
 {
 	struct annotation *notes = symbol__annotation(sym);
 	struct annotated_branch *branch;
+	const size_t size = symbol__size(sym);
 
 	branch = annotation__get_branch(notes);
 	if (branch == NULL)
 		return NULL;
 
 	if (branch->cycles_hist == NULL) {
-		const size_t size = symbol__size(sym);
-
 		branch->cycles_hist = calloc(size, sizeof(struct cyc_hist));
+		if (!branch->cycles_hist)
+			return NULL;
+	}
+
+	if (br_cntr_nr && branch->br_cntr == NULL) {
+		branch->br_cntr = calloc(br_cntr_nr * size, sizeof(u64));
+		if (!branch->br_cntr)
+			return NULL;
 	}
 
-	return branch->cycles_hist;
+	return branch;
 }
 
 struct annotated_source *symbol__hists(struct symbol *sym, int nr_hists)
@@ -316,16 +324,44 @@ static int symbol__inc_addr_samples(struct map_symbol *ms,
 	return src ? __symbol__inc_addr_samples(ms, src, evsel->core.idx, addr, sample) : 0;
 }
 
-static int symbol__account_cycles(u64 addr, u64 start,
-				  struct symbol *sym, unsigned cycles)
+static int symbol__account_br_cntr(struct annotated_branch *branch,
+				   struct evsel *evsel,
+				   unsigned offset,
+				   u64 br_cntr)
+{
+	unsigned int br_cntr_nr = evsel__leader(evsel)->br_cntr_nr;
+	unsigned int base = evsel__leader(evsel)->br_cntr_idx;
+	unsigned int width = evsel__env(evsel)->br_cntr_width;
+	unsigned int off = offset * evsel->evlist->nr_br_cntr;
+	unsigned int i, mask = (1L << width) - 1;
+	u64 *branch_br_cntr = branch->br_cntr;
+
+	if (!br_cntr || !branch_br_cntr)
+		return 0;
+
+	for (i = 0; i < br_cntr_nr; i++) {
+		u64 cntr = (br_cntr >> i * width) & mask;
+
+		branch_br_cntr[off + i + base] += cntr;
+		if (cntr == mask)
+			branch_br_cntr[off + i + base] |= ANNOTATION__BR_CNTR_SATURATED_FLAG;
+	}
+
+	return 0;
+}
+
+static int symbol__account_cycles(u64 addr, u64 start, struct symbol *sym,
+				  unsigned cycles, struct evsel *evsel,
+				  u64 br_cntr)
 {
-	struct cyc_hist *cycles_hist;
+	struct annotated_branch *branch;
 	unsigned offset;
+	int ret;
 
 	if (sym == NULL)
 		return 0;
-	cycles_hist = symbol__cycles_hist(sym);
-	if (cycles_hist == NULL)
+	branch = symbol__find_branch_hist(sym, evsel->evlist->nr_br_cntr);
+	if (!branch)
 		return -ENOMEM;
 	if (addr < sym->start || addr >= sym->end)
 		return -ERANGE;
@@ -337,15 +373,22 @@ static int symbol__account_cycles(u64 addr, u64 start,
 			start = 0;
 	}
 	offset = addr - sym->start;
-	return __symbol__account_cycles(cycles_hist,
+	ret = __symbol__account_cycles(branch->cycles_hist,
 					start ? start - sym->start : 0,
 					offset, cycles,
 					!!start);
+
+	if (ret)
+		return ret;
+
+	return symbol__account_br_cntr(branch, evsel, offset, br_cntr);
 }
 
 int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
 				    struct addr_map_symbol *start,
-				    unsigned cycles)
+				    unsigned cycles,
+				    struct evsel *evsel,
+				    u64 br_cntr)
 {
 	u64 saddr = 0;
 	int err;
@@ -371,7 +414,7 @@ int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
 			start ? start->addr : 0,
 			ams->ms.sym ? ams->ms.sym->start + map__start(ams->ms.map) : 0,
 			saddr);
-	err = symbol__account_cycles(ams->al_addr, saddr, ams->ms.sym, cycles);
+	err = symbol__account_cycles(ams->al_addr, saddr, ams->ms.sym, cycles, evsel, br_cntr);
 	if (err)
 		pr_debug2("account_cycles failed %d\n", err);
 	return err;
@@ -412,6 +455,7 @@ static void annotated_branch__delete(struct annotated_branch *branch)
 {
 	if (branch) {
 		zfree(&branch->cycles_hist);
+		free(branch->br_cntr);
 		free(branch);
 	}
 }
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 0e52558baa01..6b3023f3b18f 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -14,6 +14,7 @@
 #include "spark.h"
 #include "hashmap.h"
 #include "disasm.h"
+#include "branch.h"
 
 struct hist_browser_timer;
 struct hist_entry;
@@ -288,6 +289,9 @@ struct annotated_source {
 struct annotation_line *annotated_source__get_line(struct annotated_source *src,
 						   s64 offset);
 
+/* A branch counter once saturated */
+#define ANNOTATION__BR_CNTR_SATURATED_FLAG	(1ULL << 63)
+
 /**
  * struct annotated_branch - basic block and IPC information for a symbol.
  *
@@ -297,6 +301,7 @@ struct annotation_line *annotated_source__get_line(struct annotated_source *src,
  * @cover_insn: Number of distinct, actually executed instructions.
  * @cycles_hist: Array of cyc_hist for each instruction.
  * @max_coverage: Maximum number of covered basic block (used for block-range).
+ * @br_cntr: Array of the occurrences of events (branch counters) during a block.
  *
  * This struct is used by two different codes when the sample has branch stack
  * and cycles information.  annotation__compute_ipc() calculates average IPC
@@ -313,6 +318,7 @@ struct annotated_branch {
 	unsigned int		cover_insn;
 	struct cyc_hist		*cycles_hist;
 	u64			max_coverage;
+	u64			*br_cntr;
 };
 
 struct LOCKABLE annotation {
@@ -383,7 +389,9 @@ struct annotated_branch *annotation__get_branch(struct annotation *notes);
 
 int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
 				    struct addr_map_symbol *start,
-				    unsigned cycles);
+				    unsigned cycles,
+				    struct evsel *evsel,
+				    u64 br_cntr);
 
 int hist_entry__inc_addr_samples(struct hist_entry *he, struct perf_sample *sample,
 				 struct evsel *evsel, u64 addr);
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index 87704d713ff6..b80c12c74bbb 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -34,6 +34,7 @@ struct branch_info {
 	struct addr_map_symbol from;
 	struct addr_map_symbol to;
 	struct branch_flags    flags;
+	u64		       branch_stack_cntr;
 	char		       *srcline_from;
 	char		       *srcline_to;
 };
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index c8c1b511f8a7..0f554febf9a1 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2677,7 +2677,7 @@ int hists__unlink(struct hists *hists)
 
 void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 			  struct perf_sample *sample, bool nonany_branch_mode,
-			  u64 *total_cycles)
+			  u64 *total_cycles, struct evsel *evsel)
 {
 	struct branch_info *bi;
 	struct branch_entry *entries = perf_sample__branch_entries(sample);
@@ -2701,7 +2701,8 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 			for (int i = bs->nr - 1; i >= 0; i--) {
 				addr_map_symbol__account_cycles(&bi[i].from,
 					nonany_branch_mode ? NULL : prev,
-					bi[i].flags.cycles);
+					bi[i].flags.cycles, evsel,
+					bi[i].branch_stack_cntr);
 				prev = &bi[i].to;
 
 				if (total_cycles)
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 5273f5c37050..30c13fc8cbe4 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -742,7 +742,7 @@ unsigned int hists__overhead_width(struct hists *hists);
 
 void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
 			  struct perf_sample *sample, bool nonany_branch_mode,
-			  u64 *total_cycles);
+			  u64 *total_cycles, struct evsel *evsel);
 
 struct option;
 int parse_filter_percentage(const struct option *opt, const char *arg, int unset);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 706be5e4a076..c08774d06d14 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2141,6 +2141,7 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 	unsigned int i;
 	const struct branch_stack *bs = sample->branch_stack;
 	struct branch_entry *entries = perf_sample__branch_entries(sample);
+	u64 *branch_stack_cntr = sample->branch_stack_cntr;
 	struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
 
 	if (!bi)
@@ -2150,6 +2151,8 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
 		ip__resolve_ams(al->thread, &bi[i].to, entries[i].to);
 		ip__resolve_ams(al->thread, &bi[i].from, entries[i].from);
 		bi[i].flags = entries[i].flags;
+		if (branch_stack_cntr)
+			bi[i].branch_stack_cntr  = branch_stack_cntr[i];
 	}
 	return bi;
 }
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V3 5/9] perf evsel: Assign abbr name for the branch counter events
  2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
                   ` (3 preceding siblings ...)
  2024-08-13 16:02 ` [PATCH V3 4/9] perf annotate: Save branch counters for each block kan.liang
@ 2024-08-13 16:02 ` kan.liang
  2024-08-13 16:02 ` [PATCH V3 6/9] perf report: Display the branch counter histogram kan.liang
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: kan.liang @ 2024-08-13 16:02 UTC (permalink / raw)
  To: acme, namhyung, irogers, peterz, mingo, linux-kernel,
	linux-perf-users
  Cc: adrian.hunter, ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

There could be several branch counter events. If perf tool output the
result via the format "event name + a number", the line could be very
long and hard to read.

An abbreviation is introduced to replace the full event name in the
display. The abbreviation starts from 'A' to 'Z9', which can support
up to 286 events. The same abbreviation will be assigned if the same
events are found in the evlist. The next patch will utilize the
abbreviation name to show the branch counter events in the output.

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/util/evlist.c | 55 +++++++++++++++++++++++++++++++++++++++-
 tools/perf/util/evsel.h  |  6 +++++
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 68bbd3ea771b..c0379fa1ccfe 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -33,6 +33,7 @@
 #include "util/bpf-filter.h"
 #include "util/stat.h"
 #include "util/util.h"
+#include "util/env.h"
 #include "util/intel-tpebs.h"
 #include <signal.h>
 #include <unistd.h>
@@ -1265,15 +1266,67 @@ u64 evlist__combined_branch_type(struct evlist *evlist)
 	return branch_type;
 }
 
+static struct evsel *
+evlist__find_dup_event_from_prev(struct evlist *evlist, struct evsel *event)
+{
+	struct evsel *pos;
+
+	evlist__for_each_entry(evlist, pos) {
+		if (event == pos)
+			break;
+		if ((pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
+		    !strcmp(pos->name, event->name))
+			return pos;
+	}
+	return NULL;
+}
+
+#define MAX_NR_ABBR_NAME	(26 * 11)
+
+/*
+ * The abbr name is from A to Z9. If the number of event
+ * which requires the branch counter > MAX_NR_ABBR_NAME,
+ * return NA.
+ */
+static void evlist__new_abbr_name(char *name)
+{
+	static int idx;
+	int i = idx / 26;
+
+	if (idx >= MAX_NR_ABBR_NAME) {
+		name[0] = 'N';
+		name[1] = 'A';
+		name[2] = '\0';
+		return;
+	}
+
+	name[0] = 'A' + (idx % 26);
+
+	if (!i)
+		name[1] = '\0';
+	else {
+		name[1] = '0' + i - 1;
+		name[2] = '\0';
+	}
+
+	idx++;
+}
+
 void evlist__update_br_cntr(struct evlist *evlist)
 {
-	struct evsel *evsel;
+	struct evsel *evsel, *dup;
 	int i = 0;
 
 	evlist__for_each_entry(evlist, evsel) {
 		if (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) {
 			evsel->br_cntr_idx = i++;
 			evsel__leader(evsel)->br_cntr_nr++;
+
+			dup = evlist__find_dup_event_from_prev(evlist, evsel);
+			if (dup)
+				memcpy(evsel->abbr_name, dup->abbr_name, 3 * sizeof(char));
+			else
+				evlist__new_abbr_name(evsel->abbr_name);
 		}
 	}
 	evlist->nr_br_cntr = i;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index a393dae1dc96..4316992f6a69 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -153,9 +153,15 @@ struct evsel {
 	 * br_cntr_idx: The idx of the branch counter event in the evlist
 	 * br_cntr_nr:  The number of the branch counter event in the group
 	 *	        (Only available for the leader event)
+	 * abbr_name:   The abbreviation name assigned to an event which is
+	 *		logged by the branch counter.
+	 *		The abbr name is from A to Z9. NA is applied if out
+	 *		of the range.
 	 */
 	int			br_cntr_idx;
 	int			br_cntr_nr;
+	char			abbr_name[3];
+
 	/*
 	 * bpf_counter_ops serves two use cases:
 	 *   1. perf-stat -b          counting events used byBPF programs
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V3 6/9] perf report: Display the branch counter histogram
  2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
                   ` (4 preceding siblings ...)
  2024-08-13 16:02 ` [PATCH V3 5/9] perf evsel: Assign abbr name for the branch counter events kan.liang
@ 2024-08-13 16:02 ` kan.liang
  2024-08-13 16:02 ` [PATCH V3 7/9] perf annotate: " kan.liang
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: kan.liang @ 2024-08-13 16:02 UTC (permalink / raw)
  To: acme, namhyung, irogers, peterz, mingo, linux-kernel,
	linux-perf-users
  Cc: adrian.hunter, ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Reusing the existing --total-cycles option to display the branch
counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
the logged branch counter events. They are shown right after all the
cycle-related annotations.
Extend the struct block_info to store and pass the branch counter
related information.

The annotation_br_cntr_entry() is to print the histogram of each branch
counter event. If the number of logged events is less than 4, the exact
number of the abbr name is printed. Otherwise, using '+' to stands for
more than 3 events.

Assume the number of logged events is less than 4.

The annotation_br_cntr_abbr_list() prints the branch counter's
abbreviation list. Press 'B' to display the list in the TUI mode.

$perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
$perf report  --total-cycles --stdio

 # To display the perf.data header info, please use --header/--header-only options.
 #
 #
 # Total Lost Samples: 0
 #
 # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
 # Event count (approx.): 1610046
 #
 # Branch counter abbr list:
 # branch-instructions:ppp = A
 # branch-misses = B
 # '-' No event occurs
 # '+' Event occurrences may be lost due to branch counter saturated
 #
 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles          Branch Counter [Program Block Range]
 # ...............  ..............  ...........  ..........  ......................  ..................
 #
           57.55%            2.5M        0.00%           3             |A   |-   |                 ...
           25.27%            1.1M        0.00%           2             |AA  |-   |                 ...
           15.61%          667.2K        0.00%           1             |A   |-   |                 ...
            0.16%            6.9K        0.81%         575             |A   |-   |                 ...
            0.16%            6.8K        1.38%         977             |AA  |-   |                 ...
            0.16%            6.8K        0.04%          28             |AA  |B   |                 ...
            0.15%            6.6K        1.33%         946             |A   |-   |                 ...
            0.11%            4.5K        0.06%          46             |AAA+|-   |                 ...
            0.10%            4.4K        0.88%         624             |A   |-   |                 ...
            0.09%            3.7K        0.74%         524             |AAA+|B   |                 ...

With -v applied,

 # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles          Branch Counter [Program Block Range]
 # ...............  ..............  ...........  ..........  ......................  ..................
 #
           57.55%            2.5M        0.00%           3               A=1 ,B=-                  ...
           25.27%            1.1M        0.00%           2               A=2 ,B=-                  ...
           15.61%          667.2K        0.00%           1               A=1 ,B=-                  ...
            0.16%            6.9K        0.81%         575               A=1 ,B=-                  ...
            0.16%            6.8K        1.38%         977               A=2 ,B=-                  ...
            0.16%            6.8K        0.04%          28               A=2 ,B=1                  ...
            0.15%            6.6K        1.33%         946               A=1 ,B=-                  ...
            0.11%            4.5K        0.06%          46               A=3+,B=-                  ...
            0.10%            4.4K        0.88%         624               A=1 ,B=-                  ...
            0.09%            3.7K        0.74%         524               A=3+,B=1                  ...

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-report.txt |   1 +
 tools/perf/builtin-diff.c                |   4 +-
 tools/perf/builtin-report.c              |  20 +++-
 tools/perf/ui/browsers/hists.c           |  17 ++-
 tools/perf/util/annotate.c               | 145 +++++++++++++++++++++++
 tools/perf/util/annotate.h               |   3 +
 tools/perf/util/block-info.c             |  66 +++++++++--
 tools/perf/util/block-info.h             |   8 +-
 8 files changed, 246 insertions(+), 18 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index d2b1593ef700..7c66d81ab978 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -614,6 +614,7 @@ include::itrace.txt[]
 	'Avg Cycles%'     - block average sampled cycles / sum of total block average
 			    sampled cycles
 	'Avg Cycles'      - block average sampled cycles
+	'Branch Counter'  - block branch counter histogram (with -v showing the number)
 
 --skip-empty::
 	Do not print 0 results in the --stat output.
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 6983b5373372..23326dd20333 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -691,7 +691,7 @@ static void hists__precompute(struct hists *hists)
 		if (compute == COMPUTE_CYCLES) {
 			bh = container_of(he, struct block_hist, he);
 			init_block_hist(bh);
-			block_info__process_sym(he, bh, NULL, 0);
+			block_info__process_sym(he, bh, NULL, 0, 0);
 		}
 
 		data__for_each_file_new(i, d) {
@@ -714,7 +714,7 @@ static void hists__precompute(struct hists *hists)
 				pair_bh = container_of(pair, struct block_hist,
 						       he);
 				init_block_hist(pair_bh);
-				block_info__process_sym(pair, pair_bh, NULL, 0);
+				block_info__process_sym(pair, pair_bh, NULL, 0, 0);
 
 				bh = container_of(he, struct block_hist, he);
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index ab450644e5b3..1643113616f4 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -575,6 +575,13 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
 		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
 
 		if (rep->total_cycles_mode) {
+			char *buf;
+
+			if (!annotation_br_cntr_abbr_list(&buf, pos, true)) {
+				fprintf(stdout, "%s", buf);
+				fprintf(stdout, "#\n");
+				free(buf);
+			}
 			report__browse_block_hists(&rep->block_reports[i - 1].hist,
 						   rep->min_percent, pos, NULL);
 			continue;
@@ -1119,18 +1126,23 @@ static int __cmd_report(struct report *rep)
 	report__output_resort(rep);
 
 	if (rep->total_cycles_mode) {
-		int block_hpps[6] = {
+		int nr_hpps = 4;
+		int block_hpps[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
 			PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT,
 			PERF_HPP_REPORT__BLOCK_LBR_CYCLES,
 			PERF_HPP_REPORT__BLOCK_CYCLES_PCT,
 			PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
-			PERF_HPP_REPORT__BLOCK_RANGE,
-			PERF_HPP_REPORT__BLOCK_DSO,
 		};
 
+		if (session->evlist->nr_br_cntr > 0)
+			block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER;
+
+		block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_RANGE;
+		block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_DSO;
+
 		rep->block_reports = block_info__create_report(session->evlist,
 							       rep->total_cycles,
-							       block_hpps, 6,
+							       block_hpps, nr_hpps,
 							       &rep->nr_block_reports);
 		if (!rep->block_reports)
 			return -1;
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index b7219df51236..970f7f349298 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -3684,8 +3684,10 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
 	struct hist_browser *browser;
 	int key = -1;
 	struct popup_action action;
+	char *br_cntr_text = NULL;
 	static const char help[] =
-	" q             Quit \n";
+	" q             Quit \n"
+	" B             Branch counter abbr list (Optional)\n";
 
 	browser = hist_browser__new(hists);
 	if (!browser)
@@ -3703,6 +3705,8 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
 
 	memset(&action, 0, sizeof(action));
 
+	annotation_br_cntr_abbr_list(&br_cntr_text, evsel, false);
+
 	while (1) {
 		key = hist_browser__run(browser, "? - help", true, 0);
 
@@ -3723,6 +3727,16 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
 			action.ms.sym = browser->selection->sym;
 			do_annotate(browser, &action);
 			continue;
+		case 'B':
+			if (br_cntr_text) {
+				ui__question_window("Branch counter abbr list",
+						    br_cntr_text, "Press any key...", 0);
+			} else {
+				ui__question_window("Branch counter abbr list",
+						    "\n The branch counter is not available.\n",
+						    "Press any key...", 0);
+			}
+			continue;
 		default:
 			break;
 		}
@@ -3730,5 +3744,6 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
 
 out:
 	hist_browser__delete(browser);
+	free(br_cntr_text);
 	return 0;
 }
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index c269758d47c8..5200cafe20e7 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -41,6 +41,7 @@
 #include "namespaces.h"
 #include "thread.h"
 #include "hashmap.h"
+#include "strbuf.h"
 #include <regex.h>
 #include <linux/bitops.h>
 #include <linux/kernel.h>
@@ -48,6 +49,7 @@
 #include <linux/zalloc.h>
 #include <subcmd/parse-options.h>
 #include <subcmd/run-command.h>
+#include <math.h>
 
 /* FIXME: For the HE_COLORSET */
 #include "ui/browser.h"
@@ -1719,6 +1721,149 @@ static void ipc_coverage_string(char *bf, int size, struct annotation *notes)
 		  ipc, coverage);
 }
 
+int annotation_br_cntr_abbr_list(char **str, struct evsel *evsel, bool header)
+{
+	struct evsel *pos;
+	struct strbuf sb;
+
+	if (evsel->evlist->nr_br_cntr <= 0)
+		return -ENOTSUP;
+
+	strbuf_init(&sb, /*hint=*/ 0);
+
+	if (header && strbuf_addf(&sb, "# Branch counter abbr list:\n"))
+		goto err;
+
+	evlist__for_each_entry(evsel->evlist, pos) {
+		if (!(pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS))
+			continue;
+		if (header && strbuf_addf(&sb, "#"))
+			goto err;
+
+		if (strbuf_addf(&sb, " %s = %s\n", pos->name, pos->abbr_name))
+			goto err;
+	}
+
+	if (header && strbuf_addf(&sb, "#"))
+		goto err;
+	if (strbuf_addf(&sb, " '-' No event occurs\n"))
+		goto err;
+
+	if (header && strbuf_addf(&sb, "#"))
+		goto err;
+	if (strbuf_addf(&sb, " '+' Event occurrences may be lost due to branch counter saturated\n"))
+		goto err;
+
+	*str = strbuf_detach(&sb, NULL);
+
+	return 0;
+err:
+	strbuf_release(&sb);
+	return -ENOMEM;
+}
+
+/* Assume the branch counter saturated at 3 */
+#define ANNOTATION_BR_CNTR_SATURATION		3
+
+int annotation_br_cntr_entry(char **str, int br_cntr_nr,
+			     u64 *br_cntr, int num_aggr,
+			     struct evsel *evsel)
+{
+	struct evsel *pos = evsel ? evlist__first(evsel->evlist) : NULL;
+	bool saturated = false;
+	int i, j, avg, used;
+	struct strbuf sb;
+
+	strbuf_init(&sb, /*hint=*/ 0);
+	for (i = 0; i < br_cntr_nr; i++) {
+		used = 0;
+		avg = ceil((double)(br_cntr[i] & ~ANNOTATION__BR_CNTR_SATURATED_FLAG) /
+			   (double)num_aggr);
+
+		/*
+		 * A histogram with the abbr name is displayed by default.
+		 * With -v, the exact number of branch counter is displayed.
+		 */
+		if (verbose) {
+			evlist__for_each_entry_from(evsel->evlist, pos) {
+				if ((pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
+				    (pos->br_cntr_idx == i))
+				break;
+			}
+			if (strbuf_addstr(&sb, pos->abbr_name))
+				goto err;
+
+			if (!br_cntr[i]) {
+				if (strbuf_addstr(&sb, "=-"))
+					goto err;
+			} else {
+				if (strbuf_addf(&sb, "=%d", avg))
+					goto err;
+			}
+			if (br_cntr[i] & ANNOTATION__BR_CNTR_SATURATED_FLAG) {
+				if (strbuf_addch(&sb, '+'))
+					goto err;
+			} else {
+				if (strbuf_addch(&sb, ' '))
+					goto err;
+			}
+
+			if ((i < br_cntr_nr - 1) && strbuf_addch(&sb, ','))
+				goto err;
+			continue;
+		}
+
+		if (strbuf_addch(&sb, '|'))
+			goto err;
+
+		if (!br_cntr[i]) {
+			if (strbuf_addch(&sb, '-'))
+				goto err;
+			used++;
+		} else {
+			evlist__for_each_entry_from(evsel->evlist, pos) {
+				if ((pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
+				    (pos->br_cntr_idx == i))
+					break;
+			}
+			if (br_cntr[i] & ANNOTATION__BR_CNTR_SATURATED_FLAG)
+				saturated = true;
+
+			for (j = 0; j < avg; j++, used++) {
+				/* Print + if the number of logged events > 3 */
+				if (j >= ANNOTATION_BR_CNTR_SATURATION) {
+					saturated = true;
+					break;
+				}
+				if (strbuf_addstr(&sb, pos->abbr_name))
+					goto err;
+			}
+
+			if (saturated) {
+				if (strbuf_addch(&sb, '+'))
+					goto err;
+				used++;
+			}
+			pos = list_next_entry(pos, core.node);
+		}
+
+		for (j = used; j < ANNOTATION_BR_CNTR_SATURATION + 1; j++) {
+			if (strbuf_addch(&sb, ' '))
+				goto err;
+		}
+	}
+
+	if (!verbose && strbuf_addch(&sb, br_cntr_nr ? '|' : ' '))
+		goto err;
+
+	*str = strbuf_detach(&sb, NULL);
+
+	return 0;
+err:
+	strbuf_release(&sb);
+	return -ENOMEM;
+}
+
 static void __annotation_line__write(struct annotation_line *al, struct annotation *notes,
 				     bool first_line, bool current_entry, bool change_color, int width,
 				     void *obj, unsigned int percent_type,
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 6b3023f3b18f..4fdaa3d5e8d2 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -553,4 +553,7 @@ int annotate_get_basic_blocks(struct symbol *sym, s64 src, s64 dst,
 
 void debuginfo_cache__delete(void);
 
+int annotation_br_cntr_entry(char **str, int br_cntr_nr, u64 *br_cntr,
+			     int num_aggr, struct evsel *evsel);
+int annotation_br_cntr_abbr_list(char **str, struct evsel *evsel, bool header);
 #endif	/* __PERF_ANNOTATE_H */
diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
index 04068d48683f..649392bee7ed 100644
--- a/tools/perf/util/block-info.c
+++ b/tools/perf/util/block-info.c
@@ -40,16 +40,32 @@ static struct block_header_column {
 	[PERF_HPP_REPORT__BLOCK_DSO] = {
 		.name = "Shared Object",
 		.width = 20,
+	},
+	[PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER] = {
+		.name = "Branch Counter",
+		.width = 30,
 	}
 };
 
-struct block_info *block_info__new(void)
+static struct block_info *block_info__new(unsigned int br_cntr_nr)
 {
-	return zalloc(sizeof(struct block_info));
+	struct block_info *bi = zalloc(sizeof(struct block_info));
+
+	if (bi && br_cntr_nr) {
+		bi->br_cntr = calloc(br_cntr_nr, sizeof(u64));
+		if (!bi->br_cntr) {
+			free(bi);
+			return NULL;
+		}
+	}
+
+	return bi;
 }
 
 void block_info__delete(struct block_info *bi)
 {
+	if (bi)
+		free(bi->br_cntr);
 	free(bi);
 }
 
@@ -86,7 +102,8 @@ int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 
 static void init_block_info(struct block_info *bi, struct symbol *sym,
 			    struct cyc_hist *ch, int offset,
-			    u64 total_cycles)
+			    u64 total_cycles, unsigned int br_cntr_nr,
+			    u64 *br_cntr, struct evsel *evsel)
 {
 	bi->sym = sym;
 	bi->start = ch->start;
@@ -99,10 +116,18 @@ static void init_block_info(struct block_info *bi, struct symbol *sym,
 
 	memcpy(bi->cycles_spark, ch->cycles_spark,
 	       NUM_SPARKS * sizeof(u64));
+
+	if (br_cntr && br_cntr_nr) {
+		bi->br_cntr_nr = br_cntr_nr;
+		memcpy(bi->br_cntr, &br_cntr[offset * br_cntr_nr],
+		       br_cntr_nr * sizeof(u64));
+	}
+	bi->evsel = evsel;
 }
 
 int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
-			    u64 *block_cycles_aggr, u64 total_cycles)
+			    u64 *block_cycles_aggr, u64 total_cycles,
+			    unsigned int br_cntr_nr)
 {
 	struct annotation *notes;
 	struct cyc_hist *ch;
@@ -125,12 +150,14 @@ int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
 			struct block_info *bi;
 			struct hist_entry *he_block;
 
-			bi = block_info__new();
+			bi = block_info__new(br_cntr_nr);
 			if (!bi)
 				return -1;
 
 			init_block_info(bi, he->ms.sym, &ch[i], i,
-					total_cycles);
+					total_cycles, br_cntr_nr,
+					notes->branch->br_cntr,
+					hists_to_evsel(he->hists));
 			cycles += bi->cycles_aggr / bi->num_aggr;
 
 			he_block = hists__add_entry_block(&bh->block_hists,
@@ -327,6 +354,24 @@ static void init_block_header(struct block_fmt *block_fmt)
 	fmt->width = block_column_width;
 }
 
+static int block_branch_counter_entry(struct perf_hpp_fmt *fmt,
+				      struct perf_hpp *hpp,
+				      struct hist_entry *he)
+{
+	struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+	struct block_info *bi = he->block_info;
+	char *buf;
+	int ret;
+
+	if (annotation_br_cntr_entry(&buf, bi->br_cntr_nr, bi->br_cntr,
+				     bi->num_aggr, bi->evsel))
+		return 0;
+
+	ret = scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
+	free(buf);
+	return ret;
+}
+
 static void hpp_register(struct block_fmt *block_fmt, int idx,
 			 struct perf_hpp_list *hpp_list)
 {
@@ -357,6 +402,9 @@ static void hpp_register(struct block_fmt *block_fmt, int idx,
 	case PERF_HPP_REPORT__BLOCK_DSO:
 		fmt->entry = block_dso_entry;
 		break;
+	case PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER:
+		fmt->entry = block_branch_counter_entry;
+		break;
 	default:
 		return;
 	}
@@ -390,7 +438,7 @@ static void init_block_hist(struct block_hist *bh, struct block_fmt *block_fmts,
 static int process_block_report(struct hists *hists,
 				struct block_report *block_report,
 				u64 total_cycles, int *block_hpps,
-				int nr_hpps)
+				int nr_hpps, unsigned int br_cntr_nr)
 {
 	struct rb_node *next = rb_first_cached(&hists->entries);
 	struct block_hist *bh = &block_report->hist;
@@ -405,7 +453,7 @@ static int process_block_report(struct hists *hists,
 	while (next) {
 		he = rb_entry(next, struct hist_entry, rb_node);
 		block_info__process_sym(he, bh, &block_report->cycles,
-					total_cycles);
+					total_cycles, br_cntr_nr);
 		next = rb_next(&he->rb_node);
 	}
 
@@ -435,7 +483,7 @@ struct block_report *block_info__create_report(struct evlist *evlist,
 		struct hists *hists = evsel__hists(pos);
 
 		process_block_report(hists, &block_reports[i], total_cycles,
-				     block_hpps, nr_hpps);
+				     block_hpps, nr_hpps, evlist->nr_br_cntr);
 		i++;
 	}
 
diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h
index 0b9e1aad4c55..b9329dc3ab59 100644
--- a/tools/perf/util/block-info.h
+++ b/tools/perf/util/block-info.h
@@ -18,6 +18,9 @@ struct block_info {
 	u64			total_cycles;
 	int			num;
 	int			num_aggr;
+	int			br_cntr_nr;
+	u64			*br_cntr;
+	struct evsel		*evsel;
 };
 
 struct block_fmt {
@@ -36,6 +39,7 @@ enum {
 	PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
 	PERF_HPP_REPORT__BLOCK_RANGE,
 	PERF_HPP_REPORT__BLOCK_DSO,
+	PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER,
 	PERF_HPP_REPORT__BLOCK_MAX_INDEX
 };
 
@@ -46,7 +50,6 @@ struct block_report {
 	int			nr_fmts;
 };
 
-struct block_info *block_info__new(void);
 void block_info__delete(struct block_info *bi);
 
 int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right);
@@ -55,7 +58,8 @@ int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
 			struct hist_entry *left, struct hist_entry *right);
 
 int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
-			    u64 *block_cycles_aggr, u64 total_cycles);
+			    u64 *block_cycles_aggr, u64 total_cycles,
+			    unsigned int br_cntr_nr);
 
 struct block_report *block_info__create_report(struct evlist *evlist,
 					       u64 total_cycles,
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V3 7/9] perf annotate: Display the branch counter histogram
  2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
                   ` (5 preceding siblings ...)
  2024-08-13 16:02 ` [PATCH V3 6/9] perf report: Display the branch counter histogram kan.liang
@ 2024-08-13 16:02 ` kan.liang
  2024-08-13 16:02 ` [PATCH V3 8/9] perf script: Add branch counters kan.liang
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 13+ messages in thread
From: kan.liang @ 2024-08-13 16:02 UTC (permalink / raw)
  To: acme, namhyung, irogers, peterz, mingo, linux-kernel,
	linux-perf-users
  Cc: adrian.hunter, ak, eranian, Kan Liang, Tinghao Zhang

From: Kan Liang <kan.liang@linux.intel.com>

Display the branch counter histogram in the annotation view.

Press 'B' to display the branch counter's abbreviation list as well.

Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3  /home/sdp/test/tchain_edit [Percent: local period]
Percent       │ IPC Cycle       Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
              │                                     0000000000401755 <f3>:
  0.00   0.00 │                                       endbr64
              │                                       push    %rbp
              │                                       mov     %rsp,%rbp
              │                                       movl    $0x0,-0x4(%rbp)
  0.00   0.00 │1.33     3          |A   |-   |      ↓ jmp     25
 11.03  11.03 │                                 11:   mov     -0x4(%rbp),%eax
              │                                       and     $0x1,%eax
              │                                       test    %eax,%eax
 17.13  17.13 │2.41     1          |A   |-   |      ↓ je      21
              │                                       addl    $0x1,-0x4(%rbp)
 21.84  21.84 │2.22     2          |AA  |-   |      ↓ jmp     25
 17.13  17.13 │                                 21:   addl    $0x1,-0x4(%rbp)
 21.84  21.84 │                                 25:   cmpl    $0x270f,-0x4(%rbp)
 11.03  11.03 │0.61     3          |A   |-   |      ↑ jle     11
              │                                       nop
              │                                       pop     %rbp
  0.00   0.00 │0.24    20          |AA  |B   |      ← ret

Originally-by: Tinghao Zhang <tinghao.zhang@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/builtin-annotate.c     | 10 +++++---
 tools/perf/ui/browsers/annotate.c | 18 ++++++++++++--
 tools/perf/ui/browsers/hists.c    |  3 ++-
 tools/perf/util/annotate.c        | 40 ++++++++++++++++++++++++++++---
 tools/perf/util/annotate.h        | 11 +++++++++
 tools/perf/util/disasm.c          |  1 +
 6 files changed, 74 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 9b4a8a379b5b..3dc6197ef3fa 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -927,11 +927,15 @@ int cmd_annotate(int argc, const char **argv)
 		sort_order = "dso,symbol";
 
 	/*
-	 * Set SORT_MODE__BRANCH so that annotate display IPC/Cycle
-	 * if branch info is in perf data in TUI mode.
+	 * Set SORT_MODE__BRANCH so that annotate displays IPC/Cycle and
+	 * branch counters, if the corresponding branch info is available
+	 * in the perf data in the TUI mode.
 	 */
-	if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack)
+	if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack) {
 		sort__mode = SORT_MODE__BRANCH;
+		if (annotate.session->evlist->nr_br_cntr > 0)
+			annotate_opts.show_br_cntr = true;
+	}
 
 	if (setup_sorting(NULL) < 0)
 		usage_with_options(annotate_usage, options);
diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index fe991a81256b..d7e727345dab 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -156,6 +156,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser)
 	struct symbol *sym = ms->sym;
 	struct annotation *notes = symbol__annotation(sym);
 	u8 pcnt_width = annotation__pcnt_width(notes);
+	u8 cntr_width = annotation__br_cntr_width();
 	int width;
 	int diff = 0;
 
@@ -205,13 +206,13 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser)
 
 	ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS);
 	__ui_browser__line_arrow(browser,
-				 pcnt_width + 2 + notes->src->widths.addr + width,
+				 pcnt_width + 2 + notes->src->widths.addr + width + cntr_width,
 				 from, to);
 
 	diff = is_fused(ab, cursor);
 	if (diff > 0) {
 		ui_browser__mark_fused(browser,
-				       pcnt_width + 3 + notes->src->widths.addr + width,
+				       pcnt_width + 3 + notes->src->widths.addr + width + cntr_width,
 				       from - diff, diff, to > from);
 	}
 }
@@ -714,6 +715,7 @@ static int annotate_browser__run(struct annotate_browser *browser,
 	struct annotation *notes = symbol__annotation(ms->sym);
 	const char *help = "Press 'h' for help on key bindings";
 	int delay_secs = hbt ? hbt->refresh : 0;
+	char *br_cntr_text = NULL;
 	char title[256];
 	int key;
 
@@ -730,6 +732,8 @@ static int annotate_browser__run(struct annotate_browser *browser,
 
 	nd = browser->curr_hot;
 
+	annotation_br_cntr_abbr_list(&br_cntr_text, evsel, false);
+
 	while (1) {
 		key = ui_browser__run(&browser->b, delay_secs);
 
@@ -796,6 +800,7 @@ static int annotate_browser__run(struct annotate_browser *browser,
 		"r             Run available scripts\n"
 		"p             Toggle percent type [local/global]\n"
 		"b             Toggle percent base [period/hits]\n"
+		"B             Branch counter abbr list (Optional)\n"
 		"?             Search string backwards\n"
 		"f             Toggle showing offsets to full address\n");
 			continue;
@@ -904,6 +909,14 @@ static int annotate_browser__run(struct annotate_browser *browser,
 			hists__scnprintf_title(hists, title, sizeof(title));
 			annotate_browser__show(&browser->b, title, help);
 			continue;
+		case 'B':
+			if (br_cntr_text)
+				ui_browser__help_window(&browser->b, br_cntr_text);
+			else {
+				ui_browser__help_window(&browser->b,
+							"\n The branch counter is not available.\n");
+			}
+			continue;
 		case 'f':
 			annotation__toggle_full_addr(notes, ms);
 			continue;
@@ -923,6 +936,7 @@ static int annotate_browser__run(struct annotate_browser *browser,
 	}
 out:
 	ui_browser__hide(&browser->b);
+	free(br_cntr_text);
 	return key;
 }
 
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 970f7f349298..49ba82bf3391 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -3705,7 +3705,8 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
 
 	memset(&action, 0, sizeof(action));
 
-	annotation_br_cntr_abbr_list(&br_cntr_text, evsel, false);
+	if (!annotation_br_cntr_abbr_list(&br_cntr_text, evsel, false))
+		annotate_opts.show_br_cntr = true;
 
 	while (1) {
 		key = hist_browser__run(browser, "? - help", true, 0);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 5200cafe20e7..4990c70b1794 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -501,8 +501,10 @@ static void annotation__count_and_fill(struct annotation *notes, u64 start, u64
 	}
 }
 
-static int annotation__compute_ipc(struct annotation *notes, size_t size)
+static int annotation__compute_ipc(struct annotation *notes, size_t size,
+				   struct evsel *evsel)
 {
+	unsigned int br_cntr_nr = evsel->evlist->nr_br_cntr;
 	int err = 0;
 	s64 offset;
 
@@ -537,6 +539,20 @@ static int annotation__compute_ipc(struct annotation *notes, size_t size)
 				al->cycles->max = ch->cycles_max;
 				al->cycles->min = ch->cycles_min;
 			}
+			if (al && notes->branch->br_cntr) {
+				if (!al->br_cntr) {
+					al->br_cntr = calloc(br_cntr_nr, sizeof(u64));
+					if (!al->br_cntr) {
+						err = ENOMEM;
+						break;
+					}
+				}
+				al->num_aggr = ch->num_aggr;
+				al->br_cntr_nr = br_cntr_nr;
+				al->evsel = evsel;
+				memcpy(al->br_cntr, &notes->branch->br_cntr[offset * br_cntr_nr],
+				       br_cntr_nr * sizeof(u64));
+			}
 		}
 	}
 
@@ -548,8 +564,10 @@ static int annotation__compute_ipc(struct annotation *notes, size_t size)
 				struct annotation_line *al;
 
 				al = annotated_source__get_line(notes->src, offset);
-				if (al)
+				if (al) {
 					zfree(&al->cycles);
+					zfree(&al->br_cntr);
+				}
 			}
 		}
 	}
@@ -1960,6 +1978,22 @@ static void __annotation_line__write(struct annotation_line *al, struct annotati
 					    "Cycle(min/max)");
 		}
 
+		if (annotate_opts.show_br_cntr) {
+			if (show_title) {
+				obj__printf(obj, "%*s ",
+					    ANNOTATION__BR_CNTR_WIDTH,
+					    "Branch Counter");
+			} else {
+				char *buf;
+
+				if (!annotation_br_cntr_entry(&buf, al->br_cntr_nr, al->br_cntr,
+							      al->num_aggr, al->evsel)) {
+					obj__printf(obj, "%*s ", ANNOTATION__BR_CNTR_WIDTH, buf);
+					free(buf);
+				}
+			}
+		}
+
 		if (show_title && !*al->line) {
 			ipc_coverage_string(bf, sizeof(bf), notes);
 			obj__printf(obj, "%*s", ANNOTATION__AVG_IPC_WIDTH, bf);
@@ -2056,7 +2090,7 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
 	annotation__set_index(notes);
 	annotation__mark_jump_targets(notes, sym);
 
-	err = annotation__compute_ipc(notes, size);
+	err = annotation__compute_ipc(notes, size, evsel);
 	if (err)
 		return err;
 
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 4fdaa3d5e8d2..8b9e05a1932f 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -31,6 +31,7 @@ struct annotated_data_type;
 #define ANNOTATION__CYCLES_WIDTH 6
 #define ANNOTATION__MINMAX_CYCLES_WIDTH 19
 #define ANNOTATION__AVG_IPC_WIDTH 36
+#define ANNOTATION__BR_CNTR_WIDTH 30
 #define ANNOTATION_DUMMY_LEN	256
 
 struct annotation_options {
@@ -44,6 +45,7 @@ struct annotation_options {
 	     show_nr_jumps,
 	     show_minmax_cycle,
 	     show_asm_raw,
+	     show_br_cntr,
 	     annotate_src,
 	     full_addr;
 	u8   offset_level;
@@ -104,6 +106,10 @@ struct annotation_line {
 	char			*fileloc;
 	char			*path;
 	struct cycles_info	*cycles;
+	int			 num_aggr;
+	int			 br_cntr_nr;
+	u64			*br_cntr;
+	struct evsel		*evsel;
 	int			 jump_sources;
 	u32			 idx;
 	int			 idx_asm;
@@ -353,6 +359,11 @@ static inline bool annotation_line__filter(struct annotation_line *al)
 	return annotate_opts.hide_src_code && al->offset == -1;
 }
 
+static inline u8 annotation__br_cntr_width(void)
+{
+	return annotate_opts.show_br_cntr ? ANNOTATION__BR_CNTR_WIDTH : 0;
+}
+
 void annotation__update_column_widths(struct annotation *notes);
 void annotation__toggle_full_addr(struct annotation *notes, struct map_symbol *ms);
 
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 22289003e16d..68aae87101bd 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -1014,6 +1014,7 @@ static void annotation_line__exit(struct annotation_line *al)
 	zfree_srcline(&al->path);
 	zfree(&al->line);
 	zfree(&al->cycles);
+	zfree(&al->br_cntr);
 }
 
 static size_t disasm_line_size(int nr)
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V3 8/9] perf script: Add branch counters
  2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
                   ` (6 preceding siblings ...)
  2024-08-13 16:02 ` [PATCH V3 7/9] perf annotate: " kan.liang
@ 2024-08-13 16:02 ` kan.liang
  2024-08-13 16:02 ` [PATCH V3 9/9] perf test: Add new test cases for the branch counter feature kan.liang
  2024-08-13 19:00 ` [PATCH V3 0/9] Support branch counters in block annotation Arnaldo Carvalho de Melo
  9 siblings, 0 replies; 13+ messages in thread
From: kan.liang @ 2024-08-13 16:02 UTC (permalink / raw)
  To: acme, namhyung, irogers, peterz, mingo, linux-kernel,
	linux-perf-users
  Cc: adrian.hunter, ak, eranian, Kan Liang, Tinghao Zhang

From: Kan Liang <kan.liang@linux.intel.com>

It's useful to print the branch counter information for each jump in
the brstackinsn when it's available.

Add a new field brcntr to display the branch counter information.

By default, the abbreviation will be used to indicate the branch
counter. In the verbose mode, the real event name is shown.

$perf script -F +brstackinsn,+brcntr

 # Branch counter abbr list:
 # branch-instructions:ppp = A
 # branch-misses = B
 # '-' No event occurs
 # '+' Event occurrences may be lost due to branch counter saturated
     tchain_edit  332203 3366329.405674:      53030 branch-instructions:ppp:            401781 f3+0x2c (home/sdp/test/tchain_edit)
        f3+31:
        0000000000401774        insn: eb 04                     br_cntr: AA     # PRED 5 cycles [5]
        000000000040177a        insn: 81 7d fc 0f 27 00 00
        0000000000401781        insn: 7e e3                     br_cntr: A      # PRED 1 cycles [6] 2.00 IPC
        0000000000401766        insn: 8b 45 fc
        0000000000401769        insn: 83 e0 01
        000000000040176c        insn: 85 c0
        000000000040176e        insn: 74 06                     br_cntr: A      # PRED 1 cycles [7] 4.00 IPC
        0000000000401776        insn: 83 45 fc 01
        000000000040177a        insn: 81 7d fc 0f 27 00 00
        0000000000401781        insn: 7e e3                     br_cntr: A      # PRED 7 cycles [14] 0.43 IPC

$perf script -F +brstackinsn,+brcntr -v

     tchain_edit  332203 3366329.405674:      53030 branch-instructions:ppp:            401781 f3+0x2c (/home/sdp/os.linux.perf.test-suite/kernels/lbr_kernel/tchain_edit)
        f3+31:
        0000000000401774        insn: eb 04                     br_cntr: branch-instructions:ppp 2 branch-misses 0      # PRED 5 cycles [5]
        000000000040177a        insn: 81 7d fc 0f 27 00 00
        0000000000401781        insn: 7e e3                     br_cntr: branch-instructions:ppp 1 branch-misses 0      # PRED 1 cycles [6] 2.00 IPC
        0000000000401766        insn: 8b 45 fc
        0000000000401769        insn: 83 e0 01
        000000000040176c        insn: 85 c0
        000000000040176e        insn: 74 06                     br_cntr: branch-instructions:ppp 1 branch-misses 0      # PRED 1 cycles [7] 4.00 IPC
        0000000000401776        insn: 83 45 fc 01
        000000000040177a        insn: 81 7d fc 0f 27 00 00
        0000000000401781        insn: 7e e3                     br_cntr: branch-instructions:ppp 1 branch-misses 0      # PRED 7 cycles [14] 0.43 IPC

Originally-by: Tinghao Zhang <tinghao.zhang@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/Documentation/perf-script.txt |  2 +-
 tools/perf/builtin-script.c              | 69 +++++++++++++++++++++---
 2 files changed, 63 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 5abb960c4960..b72866ef270b 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -134,7 +134,7 @@ OPTIONS
         srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
         brstackinsn, brstackinsnlen, brstackdisasm, brstackoff, callindent, insn, disasm,
         insnlen, synth, phys_addr, metric, misc, srccode, ipc, data_page_size,
-        code_page_size, ins_lat, machine_pid, vcpu, cgroup, retire_lat,
+        code_page_size, ins_lat, machine_pid, vcpu, cgroup, retire_lat, brcntr,
 
         Field list can be prepended with the type, trace, sw or hw,
         to indicate to which event type the field list applies.
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 54598d1e815a..206b08426555 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -62,6 +62,7 @@
 #include "util/record.h"
 #include "util/util.h"
 #include "util/cgroup.h"
+#include "util/annotate.h"
 #include "perf.h"
 
 #include <linux/ctype.h>
@@ -138,6 +139,7 @@ enum perf_output_field {
 	PERF_OUTPUT_DSOFF           = 1ULL << 41,
 	PERF_OUTPUT_DISASM          = 1ULL << 42,
 	PERF_OUTPUT_BRSTACKDISASM   = 1ULL << 43,
+	PERF_OUTPUT_BRCNTR          = 1ULL << 44,
 };
 
 struct perf_script {
@@ -213,6 +215,7 @@ struct output_option {
 	{.str = "cgroup", .field = PERF_OUTPUT_CGROUP},
 	{.str = "retire_lat", .field = PERF_OUTPUT_RETIRE_LAT},
 	{.str = "brstackdisasm", .field = PERF_OUTPUT_BRSTACKDISASM},
+	{.str = "brcntr", .field = PERF_OUTPUT_BRCNTR},
 };
 
 enum {
@@ -520,6 +523,12 @@ static int evsel__check_attr(struct evsel *evsel, struct perf_session *session)
 		       "Hint: run 'perf record -b ...'\n");
 		return -EINVAL;
 	}
+	if (PRINT_FIELD(BRCNTR) &&
+	    !(evlist__combined_branch_type(session->evlist) & PERF_SAMPLE_BRANCH_COUNTERS)) {
+		pr_err("Display of branch counter requested but it's not enabled\n"
+		       "Hint: run 'perf record -j any,counter ...'\n");
+		return -EINVAL;
+	}
 	if ((PRINT_FIELD(PID) || PRINT_FIELD(TID)) &&
 	    evsel__check_stype(evsel, PERF_SAMPLE_TID, "TID", PERF_OUTPUT_TID|PERF_OUTPUT_PID))
 		return -EINVAL;
@@ -789,6 +798,19 @@ static int perf_sample__fprintf_start(struct perf_script *script,
 	int printed = 0;
 	char tstr[128];
 
+	/*
+	 * Print the branch counter's abbreviation list,
+	 * if the branch counter is available.
+	 */
+	if (PRINT_FIELD(BRCNTR) && !verbose) {
+		char *buf;
+
+		if (!annotation_br_cntr_abbr_list(&buf, evsel, true)) {
+			printed += fprintf(stdout, "%s", buf);
+			free(buf);
+		}
+	}
+
 	if (PRINT_FIELD(MACHINE_PID) && sample->machine_pid)
 		printed += fprintf(fp, "VM:%5d ", sample->machine_pid);
 
@@ -1195,7 +1217,9 @@ static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
 			    struct perf_insn *x, u8 *inbuf, int len,
 			    int insn, FILE *fp, int *total_cycles,
 			    struct perf_event_attr *attr,
-			    struct thread *thread)
+			    struct thread *thread,
+			    struct evsel *evsel,
+			    u64 br_cntr)
 {
 	int ilen = 0;
 	int printed = fprintf(fp, "\t%016" PRIx64 "\t", ip);
@@ -1216,6 +1240,28 @@ static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
 		addr_location__exit(&al);
 	}
 
+	if (PRINT_FIELD(BRCNTR)) {
+		unsigned int width = evsel__env(evsel)->br_cntr_width;
+		unsigned int i = 0, j, num, mask = (1L << width) - 1;
+		struct evsel *pos = evsel__leader(evsel);
+
+		printed += fprintf(fp, "br_cntr: ");
+		evlist__for_each_entry_from(evsel->evlist, pos) {
+			if (!(pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS))
+				continue;
+			if (evsel__leader(pos) != evsel__leader(evsel))
+				break;
+
+			num = (br_cntr >> (i++ * width)) & mask;
+			if (!verbose) {
+				for (j = 0; j < num; j++)
+					printed += fprintf(fp, "%s", pos->abbr_name);
+			} else
+				printed += fprintf(fp, "%s %d ", pos->name, num);
+		}
+		printed += fprintf(fp, "\t");
+	}
+
 	printed += fprintf(fp, "#%s%s%s%s",
 			      en->flags.predicted ? " PRED" : "",
 			      en->flags.mispred ? " MISPRED" : "",
@@ -1272,6 +1318,7 @@ static int ip__fprintf_sym(uint64_t addr, struct thread *thread,
 }
 
 static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
+					    struct evsel *evsel,
 					    struct thread *thread,
 					    struct perf_event_attr *attr,
 					    struct machine *machine, FILE *fp)
@@ -1285,6 +1332,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	unsigned off;
 	struct symbol *lastsym = NULL;
 	int total_cycles = 0;
+	u64 br_cntr = 0;
 
 	if (!(br && br->nr))
 		return 0;
@@ -1296,6 +1344,9 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 	x.machine = machine;
 	x.cpu = sample->cpu;
 
+	if (PRINT_FIELD(BRCNTR) && sample->branch_stack_cntr)
+		br_cntr = sample->branch_stack_cntr[nr - 1];
+
 	printed += fprintf(fp, "%c", '\n');
 
 	/* Handle first from jump, of which we don't know the entry. */
@@ -1307,7 +1358,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 					   x.cpumode, x.cpu, &lastsym, attr, fp);
 		printed += ip__fprintf_jump(entries[nr - 1].from, &entries[nr - 1],
 					    &x, buffer, len, 0, fp, &total_cycles,
-					    attr, thread);
+					    attr, thread, evsel, br_cntr);
 		if (PRINT_FIELD(SRCCODE))
 			printed += print_srccode(thread, x.cpumode, entries[nr - 1].from);
 	}
@@ -1337,8 +1388,10 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
 
 			printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp);
 			if (ip == end) {
+				if (PRINT_FIELD(BRCNTR) && sample->branch_stack_cntr)
+					br_cntr = sample->branch_stack_cntr[i];
 				printed += ip__fprintf_jump(ip, &entries[i], &x, buffer + off, len - off, ++insn, fp,
-							    &total_cycles, attr, thread);
+							    &total_cycles, attr, thread, evsel, br_cntr);
 				if (PRINT_FIELD(SRCCODE))
 					printed += print_srccode(thread, x.cpumode, ip);
 				break;
@@ -1547,6 +1600,7 @@ void script_fetch_insn(struct perf_sample *sample, struct thread *thread,
 }
 
 static int perf_sample__fprintf_insn(struct perf_sample *sample,
+				     struct evsel *evsel,
 				     struct perf_event_attr *attr,
 				     struct thread *thread,
 				     struct machine *machine, FILE *fp,
@@ -1567,7 +1621,7 @@ static int perf_sample__fprintf_insn(struct perf_sample *sample,
 		printed += sample__fprintf_insn_asm(sample, thread, machine, fp, al);
 	}
 	if (PRINT_FIELD(BRSTACKINSN) || PRINT_FIELD(BRSTACKINSNLEN) || PRINT_FIELD(BRSTACKDISASM))
-		printed += perf_sample__fprintf_brstackinsn(sample, thread, attr, machine, fp);
+		printed += perf_sample__fprintf_brstackinsn(sample, evsel, thread, attr, machine, fp);
 
 	return printed;
 }
@@ -1639,7 +1693,7 @@ static int perf_sample__fprintf_bts(struct perf_sample *sample,
 	if (print_srcline_last)
 		printed += map__fprintf_srcline(al->map, al->addr, "\n  ", fp);
 
-	printed += perf_sample__fprintf_insn(sample, attr, thread, machine, fp, al);
+	printed += perf_sample__fprintf_insn(sample, evsel, attr, thread, machine, fp, al);
 	printed += fprintf(fp, "\n");
 	if (PRINT_FIELD(SRCCODE)) {
 		int ret = map__fprintf_srccode(al->map, al->addr, stdout,
@@ -2297,7 +2351,7 @@ static void process_event(struct perf_script *script,
 
 	if (evsel__is_bpf_output(evsel) && PRINT_FIELD(BPF_OUTPUT))
 		perf_sample__fprintf_bpf_output(sample, fp);
-	perf_sample__fprintf_insn(sample, attr, thread, machine, fp, al);
+	perf_sample__fprintf_insn(sample, evsel, attr, thread, machine, fp, al);
 
 	if (PRINT_FIELD(PHYS_ADDR))
 		fprintf(fp, "%16" PRIx64, sample->phys_addr);
@@ -3947,7 +4001,8 @@ int cmd_script(int argc, const char **argv)
 		     "brstacksym,flags,data_src,weight,bpf-output,brstackinsn,"
 		     "brstackinsnlen,brstackdisasm,brstackoff,callindent,insn,disasm,insnlen,synth,"
 		     "phys_addr,metric,misc,srccode,ipc,tod,data_page_size,"
-		     "code_page_size,ins_lat,machine_pid,vcpu,cgroup,retire_lat",
+		     "code_page_size,ins_lat,machine_pid,vcpu,cgroup,retire_lat,"
+		     "brcntr",
 		     parse_output_fields),
 	OPT_BOOLEAN('a', "all-cpus", &system_wide,
 		    "system-wide collection from all CPUs"),
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH V3 9/9] perf test: Add new test cases for the branch counter feature
  2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
                   ` (7 preceding siblings ...)
  2024-08-13 16:02 ` [PATCH V3 8/9] perf script: Add branch counters kan.liang
@ 2024-08-13 16:02 ` kan.liang
  2024-08-13 19:00 ` [PATCH V3 0/9] Support branch counters in block annotation Arnaldo Carvalho de Melo
  9 siblings, 0 replies; 13+ messages in thread
From: kan.liang @ 2024-08-13 16:02 UTC (permalink / raw)
  To: acme, namhyung, irogers, peterz, mingo, linux-kernel,
	linux-perf-users
  Cc: adrian.hunter, ak, eranian, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Enhance the test case for the branch counter feature.

Now, the test verifies
- The new filter can be successfully applied on the supported platforms.
- The counter value can be outputted via the perf report -D
- The counter value and the abbr name can be outputted via the
  perf script (New)

Acked-by: Namhyung Kim <namhyung@kernel.org>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 tools/perf/tests/shell/record.sh | 17 ++++++++++++-----
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
index 3d1a7759a7b2..7964ebd9007d 100755
--- a/tools/perf/tests/shell/record.sh
+++ b/tools/perf/tests/shell/record.sh
@@ -21,6 +21,7 @@ testprog="perf test -w thloop"
 cpu_pmu_dir="/sys/bus/event_source/devices/cpu*"
 br_cntr_file="/caps/branch_counter_nr"
 br_cntr_output="branch stack counters"
+br_cntr_script_output="br_cntr: A"
 
 cleanup() {
   rm -rf "${perfdata}"
@@ -165,7 +166,7 @@ test_workload() {
 }
 
 test_branch_counter() {
-  echo "Basic branch counter test"
+  echo "Branch counter test"
   # Check if the branch counter feature is supported
   for dir in $cpu_pmu_dir
   do
@@ -175,19 +176,25 @@ test_branch_counter() {
       return
     fi
   done
-  if ! perf record -o "${perfdata}" -j any,counter ${testprog} 2> /dev/null
+  if ! perf record -o "${perfdata}" -e "{branches:p,instructions}" -j any,counter ${testprog} 2> /dev/null
   then
-    echo "Basic branch counter test [Failed record]"
+    echo "Branch counter record test [Failed record]"
     err=1
     return
   fi
   if ! perf report -i "${perfdata}" -D -q | grep -q "$br_cntr_output"
   then
-    echo "Basic branch record test [Failed missing output]"
+    echo "Branch counter report test [Failed missing output]"
     err=1
     return
   fi
-  echo "Basic branch counter test [Success]"
+  if ! perf script -i "${perfdata}" -F +brstackinsn,+brcntr | grep -q "$br_cntr_script_output"
+  then
+    echo " Branch counter script test [Failed missing output]"
+    err=1
+    return
+  fi
+  echo "Branch counter test [Success]"
 }
 
 test_per_thread
-- 
2.38.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error
  2024-08-13 16:02 ` [PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error kan.liang
@ 2024-08-13 18:44   ` Arnaldo Carvalho de Melo
  2024-08-13 20:06     ` Liang, Kan
  0 siblings, 1 reply; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-13 18:44 UTC (permalink / raw)
  To: kan.liang
  Cc: namhyung, irogers, peterz, mingo, linux-kernel, linux-perf-users,
	adrian.hunter, ak, eranian

On Tue, Aug 13, 2024 at 09:02:00AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The --total-cycles may output wrong information with the --stdio.

Hey, I tried --total-cycles with --group but that didn't work, do you
think that would make sense?

Anyway, all applied, now testing and reviewing the changes,

thanks!

- Arnaldo
 
> For example,
>   perf record -e "{cycles,instructions}",cache-misses -b sleep 1
>   perf report --total-cycles --stdio
> 
> The total cycles output of {cycles,instructions} and cache-misses are
> almost the same.
> 
>  # Samples: 938  of events 'anon group { cycles, instructions }'
>  # Event count (approx.): 938
>  #
>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>  # ...............  ..............  ...........  ..........
>   ..................................................>
>  #
>            11.19%            2.6K        0.10%          21
>                           [perf_iterate_ctx+48 -> >
>             5.79%            1.4K        0.45%          97
>             [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>             5.11%            1.2K        0.33%          71
>                              [native_write_msr+0 ->>
> 
>  # Samples: 293  of event 'cache-misses'
>  # Event count (approx.): 293
>  #
>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>                                                   [>
>  # ...............  ..............  ...........  ..........
>    ..................................................>
>  #
>            11.19%            2.6K        0.13%          21
>                           [perf_iterate_ctx+48 -> >
>             5.79%            1.4K        0.59%          97
> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>             5.11%            1.2K        0.43%          71
>                              [native_write_msr+0 ->>
> 
> With the symbol_conf.event_group, the perf report should only report the
> block information of the leader event in a group.
> However, the current implementation retrieves the next event's block
> information, rather than the next group leader's block information.
> 
> Make sure the index is updated even if the event is skipped.
> 
> With the patch,
> 
>  # Samples: 293  of event 'cache-misses'
>  # Event count (approx.): 293
>  #
>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>                                                   [>
>  # ...............  ..............  ...........  ..........
>    ..................................................>
>  #
>            37.98%            9.0K        4.05%         299
>    [perf_event_addr_filters_exec+0 -> perf_event_a>
>            11.19%            2.6K        0.28%          21
>                           [perf_iterate_ctx+48 -> >
>             5.79%            1.4K        1.32%          97
> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
> 
> Fixes: 6f7164fa231a ("perf report: Sort by sampled cycles percent per block for stdio")
> Acked-by: Namhyung Kim <namhyung@kernel.org>
> Reviewed-by: Andi Kleen <ak@linux.intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  tools/perf/builtin-report.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index dfb47fa85e5c..04b9a5c1bc7e 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -565,6 +565,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
>  		struct hists *hists = evsel__hists(pos);
>  		const char *evname = evsel__name(pos);
>  
> +		i++;
>  		if (symbol_conf.event_group && !evsel__is_group_leader(pos))
>  			continue;
>  
> @@ -574,7 +575,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
>  		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
>  
>  		if (rep->total_cycles_mode) {
> -			report__browse_block_hists(&rep->block_reports[i++].hist,
> +			report__browse_block_hists(&rep->block_reports[i - 1].hist,
>  						   rep->min_percent, pos, NULL);
>  			continue;
>  		}
> -- 
> 2.38.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 0/9] Support branch counters in block annotation
  2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
                   ` (8 preceding siblings ...)
  2024-08-13 16:02 ` [PATCH V3 9/9] perf test: Add new test cases for the branch counter feature kan.liang
@ 2024-08-13 19:00 ` Arnaldo Carvalho de Melo
  9 siblings, 0 replies; 13+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-13 19:00 UTC (permalink / raw)
  To: kan.liang
  Cc: namhyung, irogers, peterz, mingo, linux-kernel, linux-perf-users,
	adrian.hunter, ak, eranian

On Tue, Aug 13, 2024 at 09:01:59AM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Changes since V2:
> - Rebase on top of the tmp.perf-tools-next commit 9da782071202
>   ("perf test: Add test for Intel TPEBS")
> - Add Acked-by from Namhyung and Reviewed-by from Andi

Thanks, applied to tmp.perf-tools-next, will move to perf-tools-next
later/tomorrow morning after the full set of container build tests pass.

- Arnaldo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error
  2024-08-13 18:44   ` Arnaldo Carvalho de Melo
@ 2024-08-13 20:06     ` Liang, Kan
  0 siblings, 0 replies; 13+ messages in thread
From: Liang, Kan @ 2024-08-13 20:06 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: namhyung, irogers, peterz, mingo, linux-kernel, linux-perf-users,
	adrian.hunter, ak, eranian



On 2024-08-13 2:44 p.m., Arnaldo Carvalho de Melo wrote:
> On Tue, Aug 13, 2024 at 09:02:00AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The --total-cycles may output wrong information with the --stdio.
> 
> Hey, I tried --total-cycles with --group but that didn't work, do you
> think that would make sense?

The current implementation doesn't handle the symbol_conf.event_group
for the tui mode.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/builtin-report.c#n543

I will send a separate patch to fix it and make it consistent.

Thanks,
Kan

> 
> Anyway, all applied, now testing and reviewing the changes,
> 
> thanks!
> 
> - Arnaldo
>  
>> For example,
>>   perf record -e "{cycles,instructions}",cache-misses -b sleep 1
>>   perf report --total-cycles --stdio
>>
>> The total cycles output of {cycles,instructions} and cache-misses are
>> almost the same.
>>
>>  # Samples: 938  of events 'anon group { cycles, instructions }'
>>  # Event count (approx.): 938
>>  #
>>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>>  # ...............  ..............  ...........  ..........
>>   ..................................................>
>>  #
>>            11.19%            2.6K        0.10%          21
>>                           [perf_iterate_ctx+48 -> >
>>             5.79%            1.4K        0.45%          97
>>             [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>>             5.11%            1.2K        0.33%          71
>>                              [native_write_msr+0 ->>
>>
>>  # Samples: 293  of event 'cache-misses'
>>  # Event count (approx.): 293
>>  #
>>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>>                                                   [>
>>  # ...............  ..............  ...........  ..........
>>    ..................................................>
>>  #
>>            11.19%            2.6K        0.13%          21
>>                           [perf_iterate_ctx+48 -> >
>>             5.79%            1.4K        0.59%          97
>> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>>             5.11%            1.2K        0.43%          71
>>                              [native_write_msr+0 ->>
>>
>> With the symbol_conf.event_group, the perf report should only report the
>> block information of the leader event in a group.
>> However, the current implementation retrieves the next event's block
>> information, rather than the next group leader's block information.
>>
>> Make sure the index is updated even if the event is skipped.
>>
>> With the patch,
>>
>>  # Samples: 293  of event 'cache-misses'
>>  # Event count (approx.): 293
>>  #
>>  # Sampled Cycles%  Sampled Cycles  Avg Cycles%  Avg Cycles
>>                                                   [>
>>  # ...............  ..............  ...........  ..........
>>    ..................................................>
>>  #
>>            37.98%            9.0K        4.05%         299
>>    [perf_event_addr_filters_exec+0 -> perf_event_a>
>>            11.19%            2.6K        0.28%          21
>>                           [perf_iterate_ctx+48 -> >
>>             5.79%            1.4K        1.32%          97
>> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>>
>> Fixes: 6f7164fa231a ("perf report: Sort by sampled cycles percent per block for stdio")
>> Acked-by: Namhyung Kim <namhyung@kernel.org>
>> Reviewed-by: Andi Kleen <ak@linux.intel.com>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>  tools/perf/builtin-report.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>> index dfb47fa85e5c..04b9a5c1bc7e 100644
>> --- a/tools/perf/builtin-report.c
>> +++ b/tools/perf/builtin-report.c
>> @@ -565,6 +565,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
>>  		struct hists *hists = evsel__hists(pos);
>>  		const char *evname = evsel__name(pos);
>>  
>> +		i++;
>>  		if (symbol_conf.event_group && !evsel__is_group_leader(pos))
>>  			continue;
>>  
>> @@ -574,7 +575,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
>>  		hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
>>  
>>  		if (rep->total_cycles_mode) {
>> -			report__browse_block_hists(&rep->block_reports[i++].hist,
>> +			report__browse_block_hists(&rep->block_reports[i - 1].hist,
>>  						   rep->min_percent, pos, NULL);
>>  			continue;
>>  		}
>> -- 
>> 2.38.1
> 

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2024-08-13 20:06 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-13 16:01 [PATCH V3 0/9] Support branch counters in block annotation kan.liang
2024-08-13 16:02 ` [PATCH V3 1/9] perf report: Fix --total-cycles --stdio output error kan.liang
2024-08-13 18:44   ` Arnaldo Carvalho de Melo
2024-08-13 20:06     ` Liang, Kan
2024-08-13 16:02 ` [PATCH V3 2/9] perf report: Remove the first overflow check for branch counters kan.liang
2024-08-13 16:02 ` [PATCH V3 3/9] perf evlist: Save branch counters information kan.liang
2024-08-13 16:02 ` [PATCH V3 4/9] perf annotate: Save branch counters for each block kan.liang
2024-08-13 16:02 ` [PATCH V3 5/9] perf evsel: Assign abbr name for the branch counter events kan.liang
2024-08-13 16:02 ` [PATCH V3 6/9] perf report: Display the branch counter histogram kan.liang
2024-08-13 16:02 ` [PATCH V3 7/9] perf annotate: " kan.liang
2024-08-13 16:02 ` [PATCH V3 8/9] perf script: Add branch counters kan.liang
2024-08-13 16:02 ` [PATCH V3 9/9] perf test: Add new test cases for the branch counter feature kan.liang
2024-08-13 19:00 ` [PATCH V3 0/9] Support branch counters in block annotation Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).