* [PATCH 1/9] perf report: Fix --total-cycles --stdio output error
2024-07-03 20:03 [PATCH 0/9] Support branch counters in block annotation kan.liang
@ 2024-07-03 20:03 ` kan.liang
2024-08-02 20:25 ` Namhyung Kim
2024-07-03 20:03 ` [PATCH 2/9] perf report: Remove the first overflow check for branch counters kan.liang
` (8 subsequent siblings)
9 siblings, 1 reply; 26+ messages in thread
From: kan.liang @ 2024-07-03 20:03 UTC (permalink / raw)
To: acme, namhyung, irogers, peterz, mingo, linux-kernel
Cc: adrian.hunter, ak, eranian, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The --total-cycles may output wrong information with the --stdio.
For example,
perf record -e "{cycles,instructions}",cache-misses -b sleep 1
perf report --total-cycles --stdio
The total cycles output of {cycles,instructions} and cache-misses are
almost the same.
# Samples: 938 of events 'anon group { cycles, instructions }'
# Event count (approx.): 938
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles
# ............... .............. ........... ..........
..................................................>
#
11.19% 2.6K 0.10% 21
[perf_iterate_ctx+48 -> >
5.79% 1.4K 0.45% 97
[__intel_pmu_enable_all.constprop.0+80 -> __intel_>
5.11% 1.2K 0.33% 71
[native_write_msr+0 ->>
# Samples: 293 of event 'cache-misses'
# Event count (approx.): 293
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles
[>
# ............... .............. ........... ..........
..................................................>
#
11.19% 2.6K 0.13% 21
[perf_iterate_ctx+48 -> >
5.79% 1.4K 0.59% 97
[__intel_pmu_enable_all.constprop.0+80 -> __intel_>
5.11% 1.2K 0.43% 71
[native_write_msr+0 ->>
With the symbol_conf.event_group, the perf report should only report the
block information of the leader event in a group.
However, the current implementation retrieves the next event's block
information, rather than the next group leader's block information.
Make sure the index is updated even if the event is skipped.
With the patch,
# Samples: 293 of event 'cache-misses'
# Event count (approx.): 293
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles
[>
# ............... .............. ........... ..........
..................................................>
#
37.98% 9.0K 4.05% 299
[perf_event_addr_filters_exec+0 -> perf_event_a>
11.19% 2.6K 0.28% 21
[perf_iterate_ctx+48 -> >
5.79% 1.4K 1.32% 97
[__intel_pmu_enable_all.constprop.0+80 -> __intel_>
Fixes: 6f7164fa231a ("perf report: Sort by sampled cycles percent per block for stdio")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/builtin-report.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 9718770facb5..b9f22c5321da 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -565,6 +565,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
struct hists *hists = evsel__hists(pos);
const char *evname = evsel__name(pos);
+ i++;
if (symbol_conf.event_group && !evsel__is_group_leader(pos))
continue;
@@ -574,7 +575,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
if (rep->total_cycles_mode) {
- report__browse_block_hists(&rep->block_reports[i++].hist,
+ report__browse_block_hists(&rep->block_reports[i - 1].hist,
rep->min_percent, pos, NULL);
continue;
}
--
2.38.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH 1/9] perf report: Fix --total-cycles --stdio output error
2024-07-03 20:03 ` [PATCH 1/9] perf report: Fix --total-cycles --stdio output error kan.liang
@ 2024-08-02 20:25 ` Namhyung Kim
0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-08-02 20:25 UTC (permalink / raw)
To: kan.liang
Cc: acme, irogers, peterz, mingo, linux-kernel, adrian.hunter, ak,
eranian
On Wed, Jul 3, 2024 at 1:03 PM <kan.liang@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> The --total-cycles may output wrong information with the --stdio.
>
> For example,
> perf record -e "{cycles,instructions}",cache-misses -b sleep 1
> perf report --total-cycles --stdio
>
> The total cycles output of {cycles,instructions} and cache-misses are
> almost the same.
>
> # Samples: 938 of events 'anon group { cycles, instructions }'
> # Event count (approx.): 938
> #
> # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles
> # ............... .............. ........... ..........
> ..................................................>
> #
> 11.19% 2.6K 0.10% 21
> [perf_iterate_ctx+48 -> >
> 5.79% 1.4K 0.45% 97
> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
> 5.11% 1.2K 0.33% 71
> [native_write_msr+0 ->>
>
> # Samples: 293 of event 'cache-misses'
> # Event count (approx.): 293
> #
> # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles
> [>
> # ............... .............. ........... ..........
> ..................................................>
> #
> 11.19% 2.6K 0.13% 21
> [perf_iterate_ctx+48 -> >
> 5.79% 1.4K 0.59% 97
> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
> 5.11% 1.2K 0.43% 71
> [native_write_msr+0 ->>
>
> With the symbol_conf.event_group, the perf report should only report the
> block information of the leader event in a group.
> However, the current implementation retrieves the next event's block
> information, rather than the next group leader's block information.
>
> Make sure the index is updated even if the event is skipped.
>
> With the patch,
>
> # Samples: 293 of event 'cache-misses'
> # Event count (approx.): 293
> #
> # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles
> [>
> # ............... .............. ........... ..........
> ..................................................>
> #
> 37.98% 9.0K 4.05% 299
> [perf_event_addr_filters_exec+0 -> perf_event_a>
> 11.19% 2.6K 0.28% 21
> [perf_iterate_ctx+48 -> >
> 5.79% 1.4K 1.32% 97
> [__intel_pmu_enable_all.constprop.0+80 -> __intel_>
>
> Fixes: 6f7164fa231a ("perf report: Sort by sampled cycles percent per block for stdio")
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Thanks,
Namhyung
> ---
> tools/perf/builtin-report.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index 9718770facb5..b9f22c5321da 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -565,6 +565,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
> struct hists *hists = evsel__hists(pos);
> const char *evname = evsel__name(pos);
>
> + i++;
> if (symbol_conf.event_group && !evsel__is_group_leader(pos))
> continue;
>
> @@ -574,7 +575,7 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
> hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
>
> if (rep->total_cycles_mode) {
> - report__browse_block_hists(&rep->block_reports[i++].hist,
> + report__browse_block_hists(&rep->block_reports[i - 1].hist,
> rep->min_percent, pos, NULL);
> continue;
> }
> --
> 2.38.1
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 2/9] perf report: Remove the first overflow check for branch counters
2024-07-03 20:03 [PATCH 0/9] Support branch counters in block annotation kan.liang
2024-07-03 20:03 ` [PATCH 1/9] perf report: Fix --total-cycles --stdio output error kan.liang
@ 2024-07-03 20:03 ` kan.liang
2024-08-02 20:26 ` Namhyung Kim
2024-07-03 20:03 ` [PATCH 3/9] perf evlist: Save branch counters information kan.liang
` (7 subsequent siblings)
9 siblings, 1 reply; 26+ messages in thread
From: kan.liang @ 2024-07-03 20:03 UTC (permalink / raw)
To: acme, namhyung, irogers, peterz, mingo, linux-kernel
Cc: adrian.hunter, ak, eranian, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
A false overflow warning is triggered if a sample doesn't have any LBRs
recorded and the branch counters feature is enabled.
The current code does OVERFLOW_CHECK_u64() at the very beginning when
reading the information of branch counters. It assumes that there is at
least one LBR in the PEBS record. But it is a valid case that 0 LBR is
recorded especially in a high context switch.
Remove the OVERFLOW_CHECK_u64(). The later OVERFLOW_CHECK() should be
good enough to check the overflow when reading the information of the
branch counters.
Fixes: 9fbb4b02302b ("perf tools: Add branch counter knob")
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/util/evsel.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index bc603193c477..a5dd031c9080 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2810,8 +2810,6 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
array = (void *)array + sz;
if (evsel__has_branch_counters(evsel)) {
- OVERFLOW_CHECK_u64(array);
-
data->branch_stack_cntr = (u64 *)array;
sz = data->branch_stack->nr * sizeof(u64);
--
2.38.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH 2/9] perf report: Remove the first overflow check for branch counters
2024-07-03 20:03 ` [PATCH 2/9] perf report: Remove the first overflow check for branch counters kan.liang
@ 2024-08-02 20:26 ` Namhyung Kim
0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-08-02 20:26 UTC (permalink / raw)
To: kan.liang
Cc: acme, irogers, peterz, mingo, linux-kernel, adrian.hunter, ak,
eranian
On Wed, Jul 3, 2024 at 1:03 PM <kan.liang@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> A false overflow warning is triggered if a sample doesn't have any LBRs
> recorded and the branch counters feature is enabled.
>
> The current code does OVERFLOW_CHECK_u64() at the very beginning when
> reading the information of branch counters. It assumes that there is at
> least one LBR in the PEBS record. But it is a valid case that 0 LBR is
> recorded especially in a high context switch.
>
> Remove the OVERFLOW_CHECK_u64(). The later OVERFLOW_CHECK() should be
> good enough to check the overflow when reading the information of the
> branch counters.
>
> Fixes: 9fbb4b02302b ("perf tools: Add branch counter knob")
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Thanks,
Namhyung
> ---
> tools/perf/util/evsel.c | 2 --
> 1 file changed, 2 deletions(-)
>
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index bc603193c477..a5dd031c9080 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -2810,8 +2810,6 @@ int evsel__parse_sample(struct evsel *evsel, union perf_event *event,
> array = (void *)array + sz;
>
> if (evsel__has_branch_counters(evsel)) {
> - OVERFLOW_CHECK_u64(array);
> -
> data->branch_stack_cntr = (u64 *)array;
> sz = data->branch_stack->nr * sizeof(u64);
>
> --
> 2.38.1
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 3/9] perf evlist: Save branch counters information
2024-07-03 20:03 [PATCH 0/9] Support branch counters in block annotation kan.liang
2024-07-03 20:03 ` [PATCH 1/9] perf report: Fix --total-cycles --stdio output error kan.liang
2024-07-03 20:03 ` [PATCH 2/9] perf report: Remove the first overflow check for branch counters kan.liang
@ 2024-07-03 20:03 ` kan.liang
2024-07-03 20:03 ` [PATCH 4/9] perf annotate: Save branch counters for each block kan.liang
` (6 subsequent siblings)
9 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2024-07-03 20:03 UTC (permalink / raw)
To: acme, namhyung, irogers, peterz, mingo, linux-kernel
Cc: adrian.hunter, ak, eranian, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
The branch counters logging (A.K.A LBR event logging) introduces a
per-counter indication of precise event occurrences in LBRs. The kernel
only dumps the number of occurrences into a record. The perf tool has
to map the number to the corresponding event.
Add evlist__update_br_cntr() to go through the evlist to pick the
events that are configured to be logged. Assign a logical idx to track
them, and add the total number of the events in the leader event.
The total number will be used to allocate the space to save the branch
counters for a block. The logical idx will be used to locate the
corresponding event quickly in the following patches.
It only needs to iterate the evlist once. The
evsel__has_branch_counters() is also optimized.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/util/evlist.c | 15 +++++++++++++++
tools/perf/util/evlist.h | 2 ++
tools/perf/util/evsel.c | 13 +++++++------
tools/perf/util/evsel.h | 8 ++++++++
4 files changed, 32 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3a719edafc7a..6f5311d01a14 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -78,6 +78,7 @@ void evlist__init(struct evlist *evlist, struct perf_cpu_map *cpus,
evlist->ctl_fd.fd = -1;
evlist->ctl_fd.ack = -1;
evlist->ctl_fd.pos = -1;
+ evlist->nr_br_cntr = -1;
}
struct evlist *evlist__new(void)
@@ -1261,6 +1262,20 @@ u64 evlist__combined_branch_type(struct evlist *evlist)
return branch_type;
}
+void evlist__update_br_cntr(struct evlist *evlist)
+{
+ struct evsel *evsel;
+ int i = 0;
+
+ evlist__for_each_entry(evlist, evsel) {
+ if (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) {
+ evsel->br_cntr_idx = i++;
+ evsel__leader(evsel)->br_cntr_nr++;
+ }
+ }
+ evlist->nr_br_cntr = i;
+}
+
bool evlist__valid_read_format(struct evlist *evlist)
{
struct evsel *first = evlist__first(evlist), *pos = first;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index cb91dc9117a2..88206dd554c7 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -56,6 +56,7 @@ struct evlist {
bool enabled;
int id_pos;
int is_pos;
+ int nr_br_cntr;
u64 combined_sample_type;
enum bkw_mmap_state bkw_mmap_state;
struct {
@@ -217,6 +218,7 @@ int evlist__apply_filters(struct evlist *evlist, struct evsel **err_evsel);
u64 __evlist__combined_sample_type(struct evlist *evlist);
u64 evlist__combined_sample_type(struct evlist *evlist);
u64 evlist__combined_branch_type(struct evlist *evlist);
+void evlist__update_br_cntr(struct evlist *evlist);
bool evlist__sample_id_all(struct evlist *evlist);
u16 evlist__id_hdr_size(struct evlist *evlist);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a5dd031c9080..89c3baae926e 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -2562,17 +2562,18 @@ u64 evsel__bitfield_swap_branch_flags(u64 value)
static inline bool evsel__has_branch_counters(const struct evsel *evsel)
{
- struct evsel *cur, *leader = evsel__leader(evsel);
+ struct evsel *leader = evsel__leader(evsel);
/* The branch counters feature only supports group */
if (!leader || !evsel->evlist)
return false;
- evlist__for_each_entry(evsel->evlist, cur) {
- if ((leader == evsel__leader(cur)) &&
- (cur->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS))
- return true;
- }
+ if (evsel->evlist->nr_br_cntr < 0)
+ evlist__update_br_cntr(evsel->evlist);
+
+ if (leader->br_cntr_nr > 0)
+ return true;
+
return false;
}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 80b5f6dd868e..a733d3407b35 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -147,6 +147,14 @@ struct evsel {
*/
__u64 synth_sample_type;
+ /*
+ * Store the branch counter related information.
+ * br_cntr_idx: The idx of the branch counter event in the evlist
+ * br_cntr_nr: The number of the branch counter event in the group
+ * (Only available for the leader event)
+ */
+ int br_cntr_idx;
+ int br_cntr_nr;
/*
* bpf_counter_ops serves two use cases:
* 1. perf-stat -b counting events used byBPF programs
--
2.38.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 4/9] perf annotate: Save branch counters for each block
2024-07-03 20:03 [PATCH 0/9] Support branch counters in block annotation kan.liang
` (2 preceding siblings ...)
2024-07-03 20:03 ` [PATCH 3/9] perf evlist: Save branch counters information kan.liang
@ 2024-07-03 20:03 ` kan.liang
2024-07-03 20:03 ` [PATCH 5/9] perf evsel: Assign abbr name for the branch counter events kan.liang
` (5 subsequent siblings)
9 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2024-07-03 20:03 UTC (permalink / raw)
To: acme, namhyung, irogers, peterz, mingo, linux-kernel
Cc: adrian.hunter, ak, eranian, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
When annotating a basic block, it's useful to display the occurrences
of other events in the block.
The branch counter feature is only available for newer Intel platforms.
So a dedicated option to display the branch counters is not introduced.
Reuse the existing --total-cycles option, which triggers the annotation
of a basic block and displays the cycle-related annotation. When the
branch counters information is available, the branch counters are
automatically appended after all the cycle-related annotation.
Accounting the branch counters as well when accounting the cycles in
hist__account_cycles().
In struct annotated_branch, introduce a br_cntr array to save the
accumulation of each branch counter.
In a sample, all the branch counters for a branch are saved in a u64
space. Because the saturation of a branch counter is small, e.g., for
Intel Sierra Forest, the saturation is only 3. Add
ANNOTATION__BR_CNTR_SATURATED_FLAG to indicate if a branch counter
once saturated. That can be used to indicate a potential event lost
because of the saturation.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/builtin-annotate.c | 3 +-
tools/perf/builtin-diff.c | 4 +--
tools/perf/builtin-report.c | 2 +-
tools/perf/builtin-top.c | 4 +--
tools/perf/util/annotate.c | 68 ++++++++++++++++++++++++++++-------
tools/perf/util/annotate.h | 10 +++++-
tools/perf/util/branch.h | 1 +
tools/perf/util/hist.c | 5 +--
tools/perf/util/hist.h | 2 +-
tools/perf/util/machine.c | 3 ++
10 files changed, 80 insertions(+), 22 deletions(-)
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index b10b7f005658..0aa40588425c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -221,7 +221,8 @@ static int process_branch_callback(struct evsel *evsel,
if (a.map != NULL)
dso__set_hit(map__dso(a.map));
- hist__account_cycles(sample->branch_stack, al, sample, false, NULL);
+ hist__account_cycles(sample->branch_stack, al, sample, false,
+ NULL, evsel);
ret = hist_entry_iter__add(&iter, &a, PERF_MAX_STACK_DEPTH, ann);
out:
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 57d300d8e570..2d9226b1de52 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -431,8 +431,8 @@ static int diff__process_sample_event(struct perf_tool *tool,
goto out;
}
- hist__account_cycles(sample->branch_stack, &al, sample, false,
- NULL);
+ hist__account_cycles(sample->branch_stack, &al, sample,
+ false, NULL, evsel);
break;
case COMPUTE_STREAM:
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index b9f22c5321da..da8d13bbb500 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -328,7 +328,7 @@ static int process_sample_event(struct perf_tool *tool,
if (ui__has_annotation() || rep->symbol_ipc || rep->total_cycles_mode) {
hist__account_cycles(sample->branch_stack, &al, sample,
rep->nonany_branch_mode,
- &rep->total_cycles);
+ &rep->total_cycles, evsel);
}
ret = hist_entry_iter__add(&iter, &al, rep->max_stack, rep);
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index e8cbbf10d361..040190a64fff 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -735,8 +735,8 @@ static int hist_iter__top_callback(struct hist_entry_iter *iter,
perf_top__record_precise_ip(top, iter->he, iter->sample, evsel, al->addr);
hist__account_cycles(iter->sample->branch_stack, al, iter->sample,
- !(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY),
- NULL);
+ !(top->record_opts.branch_stack & PERF_SAMPLE_BRANCH_ANY),
+ NULL, evsel);
return 0;
}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 1451caf25e77..6baa0671598e 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -265,22 +265,30 @@ struct annotated_branch *annotation__get_branch(struct annotation *notes)
return notes->branch;
}
-static struct cyc_hist *symbol__cycles_hist(struct symbol *sym)
+static struct annotated_branch *symbol__find_branch_hist(struct symbol *sym,
+ unsigned int br_cntr_nr)
{
struct annotation *notes = symbol__annotation(sym);
struct annotated_branch *branch;
+ const size_t size = symbol__size(sym);
branch = annotation__get_branch(notes);
if (branch == NULL)
return NULL;
if (branch->cycles_hist == NULL) {
- const size_t size = symbol__size(sym);
-
branch->cycles_hist = calloc(size, sizeof(struct cyc_hist));
+ if (!branch->cycles_hist)
+ return NULL;
+ }
+
+ if (br_cntr_nr && branch->br_cntr == NULL) {
+ branch->br_cntr = calloc(br_cntr_nr * size, sizeof(u64));
+ if (!branch->br_cntr)
+ return NULL;
}
- return branch->cycles_hist;
+ return branch;
}
struct annotated_source *symbol__hists(struct symbol *sym, int nr_hists)
@@ -315,16 +323,44 @@ static int symbol__inc_addr_samples(struct map_symbol *ms,
return src ? __symbol__inc_addr_samples(ms, src, evsel->core.idx, addr, sample) : 0;
}
-static int symbol__account_cycles(u64 addr, u64 start,
- struct symbol *sym, unsigned cycles)
+static int symbol__account_br_cntr(struct annotated_branch *branch,
+ struct evsel *evsel,
+ unsigned offset,
+ u64 br_cntr)
+{
+ unsigned int br_cntr_nr = evsel__leader(evsel)->br_cntr_nr;
+ unsigned int base = evsel__leader(evsel)->br_cntr_idx;
+ unsigned int width = evsel__env(evsel)->br_cntr_width;
+ unsigned int off = offset * evsel->evlist->nr_br_cntr;
+ unsigned int i, mask = (1L << width) - 1;
+ u64 *branch_br_cntr = branch->br_cntr;
+
+ if (!br_cntr || !branch_br_cntr)
+ return 0;
+
+ for (i = 0; i < br_cntr_nr; i++) {
+ u64 cntr = (br_cntr >> i * width) & mask;
+
+ branch_br_cntr[off + i + base] += cntr;
+ if (cntr == mask)
+ branch_br_cntr[off + i + base] |= ANNOTATION__BR_CNTR_SATURATED_FLAG;
+ }
+
+ return 0;
+}
+
+static int symbol__account_cycles(u64 addr, u64 start, struct symbol *sym,
+ unsigned cycles, struct evsel *evsel,
+ u64 br_cntr)
{
- struct cyc_hist *cycles_hist;
+ struct annotated_branch *branch;
unsigned offset;
+ int ret;
if (sym == NULL)
return 0;
- cycles_hist = symbol__cycles_hist(sym);
- if (cycles_hist == NULL)
+ branch = symbol__find_branch_hist(sym, evsel->evlist->nr_br_cntr);
+ if (!branch)
return -ENOMEM;
if (addr < sym->start || addr >= sym->end)
return -ERANGE;
@@ -336,15 +372,22 @@ static int symbol__account_cycles(u64 addr, u64 start,
start = 0;
}
offset = addr - sym->start;
- return __symbol__account_cycles(cycles_hist,
+ ret = __symbol__account_cycles(branch->cycles_hist,
start ? start - sym->start : 0,
offset, cycles,
!!start);
+
+ if (ret)
+ return ret;
+
+ return symbol__account_br_cntr(branch, evsel, offset, br_cntr);
}
int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
struct addr_map_symbol *start,
- unsigned cycles)
+ unsigned cycles,
+ struct evsel *evsel,
+ u64 br_cntr)
{
u64 saddr = 0;
int err;
@@ -370,7 +413,7 @@ int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
start ? start->addr : 0,
ams->ms.sym ? ams->ms.sym->start + map__start(ams->ms.map) : 0,
saddr);
- err = symbol__account_cycles(ams->al_addr, saddr, ams->ms.sym, cycles);
+ err = symbol__account_cycles(ams->al_addr, saddr, ams->ms.sym, cycles, evsel, br_cntr);
if (err)
pr_debug2("account_cycles failed %d\n", err);
return err;
@@ -411,6 +454,7 @@ static void annotated_branch__delete(struct annotated_branch *branch)
{
if (branch) {
zfree(&branch->cycles_hist);
+ free(branch->br_cntr);
free(branch);
}
}
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index d5c821c22f79..f39dd5d7b05e 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -14,6 +14,7 @@
#include "spark.h"
#include "hashmap.h"
#include "disasm.h"
+#include "branch.h"
struct hist_browser_timer;
struct hist_entry;
@@ -285,6 +286,9 @@ struct annotated_source {
struct annotation_line *annotated_source__get_line(struct annotated_source *src,
s64 offset);
+/* A branch counter once saturated */
+#define ANNOTATION__BR_CNTR_SATURATED_FLAG (1ULL << 63)
+
/**
* struct annotated_branch - basic block and IPC information for a symbol.
*
@@ -294,6 +298,7 @@ struct annotation_line *annotated_source__get_line(struct annotated_source *src,
* @cover_insn: Number of distinct, actually executed instructions.
* @cycles_hist: Array of cyc_hist for each instruction.
* @max_coverage: Maximum number of covered basic block (used for block-range).
+ * @br_cntr: Array of the occurrences of events (branch counters) during a block.
*
* This struct is used by two different codes when the sample has branch stack
* and cycles information. annotation__compute_ipc() calculates average IPC
@@ -310,6 +315,7 @@ struct annotated_branch {
unsigned int cover_insn;
struct cyc_hist *cycles_hist;
u64 max_coverage;
+ u64 *br_cntr;
};
struct LOCKABLE annotation {
@@ -380,7 +386,9 @@ struct annotated_branch *annotation__get_branch(struct annotation *notes);
int addr_map_symbol__account_cycles(struct addr_map_symbol *ams,
struct addr_map_symbol *start,
- unsigned cycles);
+ unsigned cycles,
+ struct evsel *evsel,
+ u64 br_cntr);
int hist_entry__inc_addr_samples(struct hist_entry *he, struct perf_sample *sample,
struct evsel *evsel, u64 addr);
diff --git a/tools/perf/util/branch.h b/tools/perf/util/branch.h
index 87704d713ff6..b80c12c74bbb 100644
--- a/tools/perf/util/branch.h
+++ b/tools/perf/util/branch.h
@@ -34,6 +34,7 @@ struct branch_info {
struct addr_map_symbol from;
struct addr_map_symbol to;
struct branch_flags flags;
+ u64 branch_stack_cntr;
char *srcline_from;
char *srcline_to;
};
diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c
index f028f113c4fd..c405c7773e15 100644
--- a/tools/perf/util/hist.c
+++ b/tools/perf/util/hist.c
@@ -2667,7 +2667,7 @@ int hists__unlink(struct hists *hists)
void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
struct perf_sample *sample, bool nonany_branch_mode,
- u64 *total_cycles)
+ u64 *total_cycles, struct evsel *evsel)
{
struct branch_info *bi;
struct branch_entry *entries = perf_sample__branch_entries(sample);
@@ -2691,7 +2691,8 @@ void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
for (int i = bs->nr - 1; i >= 0; i--) {
addr_map_symbol__account_cycles(&bi[i].from,
nonany_branch_mode ? NULL : prev,
- bi[i].flags.cycles);
+ bi[i].flags.cycles, evsel,
+ bi[i].branch_stack_cntr);
prev = &bi[i].to;
if (total_cycles)
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 5273f5c37050..30c13fc8cbe4 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -742,7 +742,7 @@ unsigned int hists__overhead_width(struct hists *hists);
void hist__account_cycles(struct branch_stack *bs, struct addr_location *al,
struct perf_sample *sample, bool nonany_branch_mode,
- u64 *total_cycles);
+ u64 *total_cycles, struct evsel *evsel);
struct option;
int parse_filter_percentage(const struct option *opt, const char *arg, int unset);
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 8477edefc299..19fc7979c66b 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -2141,6 +2141,7 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
unsigned int i;
const struct branch_stack *bs = sample->branch_stack;
struct branch_entry *entries = perf_sample__branch_entries(sample);
+ u64 *branch_stack_cntr = sample->branch_stack_cntr;
struct branch_info *bi = calloc(bs->nr, sizeof(struct branch_info));
if (!bi)
@@ -2150,6 +2151,8 @@ struct branch_info *sample__resolve_bstack(struct perf_sample *sample,
ip__resolve_ams(al->thread, &bi[i].to, entries[i].to);
ip__resolve_ams(al->thread, &bi[i].from, entries[i].from);
bi[i].flags = entries[i].flags;
+ if (branch_stack_cntr)
+ bi[i].branch_stack_cntr = branch_stack_cntr[i];
}
return bi;
}
--
2.38.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 5/9] perf evsel: Assign abbr name for the branch counter events
2024-07-03 20:03 [PATCH 0/9] Support branch counters in block annotation kan.liang
` (3 preceding siblings ...)
2024-07-03 20:03 ` [PATCH 4/9] perf annotate: Save branch counters for each block kan.liang
@ 2024-07-03 20:03 ` kan.liang
2024-08-03 0:14 ` Namhyung Kim
2024-07-03 20:03 ` [PATCH 6/9] perf report: Display the branch counter histogram kan.liang
` (4 subsequent siblings)
9 siblings, 1 reply; 26+ messages in thread
From: kan.liang @ 2024-07-03 20:03 UTC (permalink / raw)
To: acme, namhyung, irogers, peterz, mingo, linux-kernel
Cc: adrian.hunter, ak, eranian, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
There could be several branch counter events. If perf tool output the
result via the format "event name + a number", the line could be very
long and hard to read.
An abbreviation is introduced to replace the full event name in the
display. The abbreviation starts from 'A' to 'Z9', which can support
up to 286 events. The same abbreviation will be assigned if the same
events are found in the evlist. The next patch will utilize the
abbreviation name to show the branch counter events in the output.
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/util/evlist.c | 53 +++++++++++++++++++++++++++++++++++++++-
tools/perf/util/evsel.h | 4 +++
2 files changed, 56 insertions(+), 1 deletion(-)
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 6f5311d01a14..028169dcb53d 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -33,6 +33,7 @@
#include "util/bpf-filter.h"
#include "util/stat.h"
#include "util/util.h"
+#include "util/env.h"
#include <signal.h>
#include <unistd.h>
#include <sched.h>
@@ -1262,15 +1263,65 @@ u64 evlist__combined_branch_type(struct evlist *evlist)
return branch_type;
}
+static struct evsel *
+evlist__find_dup_event_from_prev(struct evlist *evlist, struct evsel *event)
+{
+ struct evsel *pos;
+
+ evlist__for_each_entry(evlist, pos) {
+ if (event == pos)
+ break;
+ if ((pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
+ !strcmp(pos->name, event->name))
+ return pos;
+ }
+ return NULL;
+}
+
+#define MAX_NR_ABBR_NAME (26 * 11)
+
+/*
+ * The abbr name is from A to Z9. If the number of event
+ * which requires the branch counter > MAX_NR_ABBR_NAME,
+ * return NA.
+ */
+static char *evlist__new_abbr_name(void)
+{
+ static int idx;
+ char str[3];
+ int i = idx / 26;
+
+ if (idx >= MAX_NR_ABBR_NAME)
+ return strdup("NA");
+
+ str[0] = 'A' + (idx % 26);
+
+ if (!i)
+ str[1] = '\0';
+ else {
+ str[1] = '0' + i - 1;
+ str[2] = '\0';
+ }
+
+ idx++;
+ return strdup(str);
+}
+
void evlist__update_br_cntr(struct evlist *evlist)
{
- struct evsel *evsel;
+ struct evsel *evsel, *dup;
int i = 0;
evlist__for_each_entry(evlist, evsel) {
if (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) {
evsel->br_cntr_idx = i++;
evsel__leader(evsel)->br_cntr_nr++;
+
+ dup = evlist__find_dup_event_from_prev(evlist, evsel);
+ if (dup)
+ evsel->abbr_name = strdup(dup->abbr_name);
+ else
+ evsel->abbr_name = evlist__new_abbr_name();
}
}
evlist->nr_br_cntr = i;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index a733d3407b35..bf37442002aa 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -152,9 +152,13 @@ struct evsel {
* br_cntr_idx: The idx of the branch counter event in the evlist
* br_cntr_nr: The number of the branch counter event in the group
* (Only available for the leader event)
+ * abbr_name: The abbreviation name assigned to an event which is
+ * logged by the branch counter.
*/
int br_cntr_idx;
int br_cntr_nr;
+ char *abbr_name;
+
/*
* bpf_counter_ops serves two use cases:
* 1. perf-stat -b counting events used byBPF programs
--
2.38.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH 5/9] perf evsel: Assign abbr name for the branch counter events
2024-07-03 20:03 ` [PATCH 5/9] perf evsel: Assign abbr name for the branch counter events kan.liang
@ 2024-08-03 0:14 ` Namhyung Kim
2024-08-06 14:11 ` Liang, Kan
0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2024-08-03 0:14 UTC (permalink / raw)
To: kan.liang
Cc: acme, irogers, peterz, mingo, linux-kernel, adrian.hunter, ak,
eranian
On Wed, Jul 3, 2024 at 1:03 PM <kan.liang@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> There could be several branch counter events. If perf tool output the
> result via the format "event name + a number", the line could be very
> long and hard to read.
>
> An abbreviation is introduced to replace the full event name in the
> display. The abbreviation starts from 'A' to 'Z9', which can support
> up to 286 events. The same abbreviation will be assigned if the same
> events are found in the evlist. The next patch will utilize the
> abbreviation name to show the branch counter events in the output.
>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
> tools/perf/util/evlist.c | 53 +++++++++++++++++++++++++++++++++++++++-
> tools/perf/util/evsel.h | 4 +++
> 2 files changed, 56 insertions(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index 6f5311d01a14..028169dcb53d 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -33,6 +33,7 @@
> #include "util/bpf-filter.h"
> #include "util/stat.h"
> #include "util/util.h"
> +#include "util/env.h"
> #include <signal.h>
> #include <unistd.h>
> #include <sched.h>
> @@ -1262,15 +1263,65 @@ u64 evlist__combined_branch_type(struct evlist *evlist)
> return branch_type;
> }
>
> +static struct evsel *
> +evlist__find_dup_event_from_prev(struct evlist *evlist, struct evsel *event)
> +{
> + struct evsel *pos;
> +
> + evlist__for_each_entry(evlist, pos) {
> + if (event == pos)
> + break;
> + if ((pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
> + !strcmp(pos->name, event->name))
> + return pos;
> + }
> + return NULL;
> +}
> +
> +#define MAX_NR_ABBR_NAME (26 * 11)
> +
> +/*
> + * The abbr name is from A to Z9. If the number of event
> + * which requires the branch counter > MAX_NR_ABBR_NAME,
> + * return NA.
> + */
> +static char *evlist__new_abbr_name(void)
> +{
> + static int idx;
> + char str[3];
> + int i = idx / 26;
> +
> + if (idx >= MAX_NR_ABBR_NAME)
> + return strdup("NA");
> +
> + str[0] = 'A' + (idx % 26);
> +
> + if (!i)
> + str[1] = '\0';
> + else {
> + str[1] = '0' + i - 1;
> + str[2] = '\0';
> + }
> +
> + idx++;
> + return strdup(str);
> +}
> +
> void evlist__update_br_cntr(struct evlist *evlist)
> {
> - struct evsel *evsel;
> + struct evsel *evsel, *dup;
> int i = 0;
>
> evlist__for_each_entry(evlist, evsel) {
> if (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) {
> evsel->br_cntr_idx = i++;
> evsel__leader(evsel)->br_cntr_nr++;
> +
> + dup = evlist__find_dup_event_from_prev(evlist, evsel);
> + if (dup)
> + evsel->abbr_name = strdup(dup->abbr_name);
> + else
> + evsel->abbr_name = evlist__new_abbr_name();
> }
> }
> evlist->nr_br_cntr = i;
> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
> index a733d3407b35..bf37442002aa 100644
> --- a/tools/perf/util/evsel.h
> +++ b/tools/perf/util/evsel.h
> @@ -152,9 +152,13 @@ struct evsel {
> * br_cntr_idx: The idx of the branch counter event in the evlist
> * br_cntr_nr: The number of the branch counter event in the group
> * (Only available for the leader event)
> + * abbr_name: The abbreviation name assigned to an event which is
> + * logged by the branch counter.
> */
> int br_cntr_idx;
> int br_cntr_nr;
> + char *abbr_name;
I think it's better to have an array (of 4 characters?) instead of a
pointer as it's supposed to be a short string.
Thanks,
Namhyung
> +
> /*
> * bpf_counter_ops serves two use cases:
> * 1. perf-stat -b counting events used byBPF programs
> --
> 2.38.1
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 5/9] perf evsel: Assign abbr name for the branch counter events
2024-08-03 0:14 ` Namhyung Kim
@ 2024-08-06 14:11 ` Liang, Kan
0 siblings, 0 replies; 26+ messages in thread
From: Liang, Kan @ 2024-08-06 14:11 UTC (permalink / raw)
To: Namhyung Kim
Cc: acme, irogers, peterz, mingo, linux-kernel, adrian.hunter, ak,
eranian
On 2024-08-02 8:14 p.m., Namhyung Kim wrote:
> On Wed, Jul 3, 2024 at 1:03 PM <kan.liang@linux.intel.com> wrote:
>>
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> There could be several branch counter events. If perf tool output the
>> result via the format "event name + a number", the line could be very
>> long and hard to read.
>>
>> An abbreviation is introduced to replace the full event name in the
>> display. The abbreviation starts from 'A' to 'Z9', which can support
>> up to 286 events. The same abbreviation will be assigned if the same
>> events are found in the evlist. The next patch will utilize the
>> abbreviation name to show the branch counter events in the output.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>> tools/perf/util/evlist.c | 53 +++++++++++++++++++++++++++++++++++++++-
>> tools/perf/util/evsel.h | 4 +++
>> 2 files changed, 56 insertions(+), 1 deletion(-)
>>
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index 6f5311d01a14..028169dcb53d 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -33,6 +33,7 @@
>> #include "util/bpf-filter.h"
>> #include "util/stat.h"
>> #include "util/util.h"
>> +#include "util/env.h"
>> #include <signal.h>
>> #include <unistd.h>
>> #include <sched.h>
>> @@ -1262,15 +1263,65 @@ u64 evlist__combined_branch_type(struct evlist *evlist)
>> return branch_type;
>> }
>>
>> +static struct evsel *
>> +evlist__find_dup_event_from_prev(struct evlist *evlist, struct evsel *event)
>> +{
>> + struct evsel *pos;
>> +
>> + evlist__for_each_entry(evlist, pos) {
>> + if (event == pos)
>> + break;
>> + if ((pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
>> + !strcmp(pos->name, event->name))
>> + return pos;
>> + }
>> + return NULL;
>> +}
>> +
>> +#define MAX_NR_ABBR_NAME (26 * 11)
>> +
>> +/*
>> + * The abbr name is from A to Z9. If the number of event
>> + * which requires the branch counter > MAX_NR_ABBR_NAME,
>> + * return NA.
>> + */
>> +static char *evlist__new_abbr_name(void)
>> +{
>> + static int idx;
>> + char str[3];
>> + int i = idx / 26;
>> +
>> + if (idx >= MAX_NR_ABBR_NAME)
>> + return strdup("NA");
>> +
>> + str[0] = 'A' + (idx % 26);
>> +
>> + if (!i)
>> + str[1] = '\0';
>> + else {
>> + str[1] = '0' + i - 1;
>> + str[2] = '\0';
>> + }
>> +
>> + idx++;
>> + return strdup(str);
>> +}
>> +
>> void evlist__update_br_cntr(struct evlist *evlist)
>> {
>> - struct evsel *evsel;
>> + struct evsel *evsel, *dup;
>> int i = 0;
>>
>> evlist__for_each_entry(evlist, evsel) {
>> if (evsel->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) {
>> evsel->br_cntr_idx = i++;
>> evsel__leader(evsel)->br_cntr_nr++;
>> +
>> + dup = evlist__find_dup_event_from_prev(evlist, evsel);
>> + if (dup)
>> + evsel->abbr_name = strdup(dup->abbr_name);
>> + else
>> + evsel->abbr_name = evlist__new_abbr_name();
>> }
>> }
>> evlist->nr_br_cntr = i;
>> diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
>> index a733d3407b35..bf37442002aa 100644
>> --- a/tools/perf/util/evsel.h
>> +++ b/tools/perf/util/evsel.h
>> @@ -152,9 +152,13 @@ struct evsel {
>> * br_cntr_idx: The idx of the branch counter event in the evlist
>> * br_cntr_nr: The number of the branch counter event in the group
>> * (Only available for the leader event)
>> + * abbr_name: The abbreviation name assigned to an event which is
>> + * logged by the branch counter.
>> */
>> int br_cntr_idx;
>> int br_cntr_nr;
>> + char *abbr_name;
>
> I think it's better to have an array (of 4 characters?) instead of a
> pointer as it's supposed to be a short string.
Sure. I think 3 characters should be enough, since the abbr name is only
from A to Z9.
Thanks,
Kan
>
> Thanks,
> Namhyung
>
>> +
>> /*
>> * bpf_counter_ops serves two use cases:
>> * 1. perf-stat -b counting events used byBPF programs
>> --
>> 2.38.1
>>
>
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 6/9] perf report: Display the branch counter histogram
2024-07-03 20:03 [PATCH 0/9] Support branch counters in block annotation kan.liang
` (4 preceding siblings ...)
2024-07-03 20:03 ` [PATCH 5/9] perf evsel: Assign abbr name for the branch counter events kan.liang
@ 2024-07-03 20:03 ` kan.liang
2024-08-03 0:18 ` Namhyung Kim
2024-07-03 20:03 ` [PATCH 7/9] perf annotate: " kan.liang
` (3 subsequent siblings)
9 siblings, 1 reply; 26+ messages in thread
From: kan.liang @ 2024-07-03 20:03 UTC (permalink / raw)
To: acme, namhyung, irogers, peterz, mingo, linux-kernel
Cc: adrian.hunter, ak, eranian, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Reusing the existing --total-cycles option to display the branch
counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
the logged branch counter events. They are shown right after all the
cycle-related annotations.
Extend the struct block_info to store and pass the branch counter
related information.
The annotation_br_cntr_entry() is to print the histogram of each branch
counter event.
The annotation_br_cntr_abbr_list() prints the branch counter's
abbreviation list. Press 'B' to display the list in the TUI mode.
$perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
$perf report --total-cycles --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
# Event count (approx.): 1610046
#
# Branch counter abbr list:
# branch-instructions:ppp = A
# branch-misses = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
#
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
# ............... .............. ........... .......... ...................... ..................
#
57.55% 2.5M 0.00% 3 |A |- | ...
25.27% 1.1M 0.00% 2 |AA |- | ...
15.61% 667.2K 0.00% 1 |A |- | ...
0.16% 6.9K 0.81% 575 |A |- | ...
0.16% 6.8K 1.38% 977 |AA |- | ...
0.16% 6.8K 0.04% 28 |AA |B | ...
0.15% 6.6K 1.33% 946 |A |- | ...
0.11% 4.5K 0.06% 46 |AAA+|- | ...
0.10% 4.4K 0.88% 624 |A |- | ...
0.09% 3.7K 0.74% 524 |AAA+|B | ...
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/Documentation/perf-report.txt | 1 +
tools/perf/builtin-diff.c | 4 +-
tools/perf/builtin-report.c | 20 ++++-
tools/perf/ui/browsers/hists.c | 17 +++-
tools/perf/util/annotate.c | 101 +++++++++++++++++++++++
tools/perf/util/annotate.h | 3 +
tools/perf/util/block-info.c | 66 +++++++++++++--
tools/perf/util/block-info.h | 8 +-
8 files changed, 202 insertions(+), 18 deletions(-)
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index d2b1593ef700..f35189d5ff1e 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -614,6 +614,7 @@ include::itrace.txt[]
'Avg Cycles%' - block average sampled cycles / sum of total block average
sampled cycles
'Avg Cycles' - block average sampled cycles
+ 'Branch Counter' - block branch counter histogram
--skip-empty::
Do not print 0 results in the --stat output.
diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
index 2d9226b1de52..de24892dc7b8 100644
--- a/tools/perf/builtin-diff.c
+++ b/tools/perf/builtin-diff.c
@@ -705,7 +705,7 @@ static void hists__precompute(struct hists *hists)
if (compute == COMPUTE_CYCLES) {
bh = container_of(he, struct block_hist, he);
init_block_hist(bh);
- block_info__process_sym(he, bh, NULL, 0);
+ block_info__process_sym(he, bh, NULL, 0, 0);
}
data__for_each_file_new(i, d) {
@@ -728,7 +728,7 @@ static void hists__precompute(struct hists *hists)
pair_bh = container_of(pair, struct block_hist,
he);
init_block_hist(pair_bh);
- block_info__process_sym(pair, pair_bh, NULL, 0);
+ block_info__process_sym(pair, pair_bh, NULL, 0, 0);
bh = container_of(he, struct block_hist, he);
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index da8d13bbb500..a0f864f2e996 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -575,6 +575,13 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
if (rep->total_cycles_mode) {
+ char *buf;
+
+ if (!annotation_br_cntr_abbr_list(&buf, pos, true)) {
+ fprintf(stdout, "%s", buf);
+ fprintf(stdout, "#\n");
+ free(buf);
+ }
report__browse_block_hists(&rep->block_reports[i - 1].hist,
rep->min_percent, pos, NULL);
continue;
@@ -1121,18 +1128,23 @@ static int __cmd_report(struct report *rep)
report__output_resort(rep);
if (rep->total_cycles_mode) {
- int block_hpps[6] = {
+ int nr_hpps = 4;
+ int block_hpps[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT,
PERF_HPP_REPORT__BLOCK_LBR_CYCLES,
PERF_HPP_REPORT__BLOCK_CYCLES_PCT,
PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
- PERF_HPP_REPORT__BLOCK_RANGE,
- PERF_HPP_REPORT__BLOCK_DSO,
};
+ if (session->evlist->nr_br_cntr > 0)
+ block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER;
+
+ block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_RANGE;
+ block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_DSO;
+
rep->block_reports = block_info__create_report(session->evlist,
rep->total_cycles,
- block_hpps, 6,
+ block_hpps, nr_hpps,
&rep->nr_block_reports);
if (!rep->block_reports)
return -1;
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index b7219df51236..73d766eac75b 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -3684,8 +3684,10 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
struct hist_browser *browser;
int key = -1;
struct popup_action action;
+ char *br_cntr_text = NULL;
static const char help[] =
- " q Quit \n";
+ " q Quit \n"
+ " B Branch counter abbr list (Optional)\n";
browser = hist_browser__new(hists);
if (!browser)
@@ -3703,6 +3705,8 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
memset(&action, 0, sizeof(action));
+ annotation_br_cntr_abbr_list(&br_cntr_text, evsel, false);
+
while (1) {
key = hist_browser__run(browser, "? - help", true, 0);
@@ -3723,6 +3727,16 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
action.ms.sym = browser->selection->sym;
do_annotate(browser, &action);
continue;
+ case 'B':
+ if (br_cntr_text) {
+ ui__question_window("Branch counter abbr list",
+ br_cntr_text, "Press any key...", 0);
+ } else {
+ ui__question_window("Branch counter abbr list",
+ "\n The branch counter is not available.\n",
+ "Press any key...", 0);
+ }
+ continue;
default:
break;
}
@@ -3730,5 +3744,6 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
out:
hist_browser__delete(browser);
+ free(br_cntr_text);
return 0;
}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 6baa0671598e..f20f9e40ef0d 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -40,6 +40,7 @@
#include "namespaces.h"
#include "thread.h"
#include "hashmap.h"
+#include "strbuf.h"
#include <regex.h>
#include <linux/bitops.h>
#include <linux/kernel.h>
@@ -47,6 +48,7 @@
#include <linux/zalloc.h>
#include <subcmd/parse-options.h>
#include <subcmd/run-command.h>
+#include <math.h>
/* FIXME: For the HE_COLORSET */
#include "ui/browser.h"
@@ -1706,6 +1708,105 @@ static void ipc_coverage_string(char *bf, int size, struct annotation *notes)
ipc, coverage);
}
+int annotation_br_cntr_abbr_list(char **str, struct evsel *evsel, bool header)
+{
+ struct evsel *pos;
+ struct strbuf sb;
+
+ if (evsel->evlist->nr_br_cntr <= 0)
+ return -ENOTSUP;
+
+ strbuf_init(&sb, /*hint=*/ 0);
+
+ if (header && strbuf_addf(&sb, "# Branch counter abbr list:\n"))
+ goto err;
+
+ evlist__for_each_entry(evsel->evlist, pos) {
+ if (!(pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS))
+ continue;
+ if (header && strbuf_addf(&sb, "#"))
+ goto err;
+
+ if (strbuf_addf(&sb, " %s = %s\n", pos->name, pos->abbr_name))
+ goto err;
+ }
+
+ if (header && strbuf_addf(&sb, "#"))
+ goto err;
+ if (strbuf_addf(&sb, " '-' No event occurs\n"))
+ goto err;
+
+ if (header && strbuf_addf(&sb, "#"))
+ goto err;
+ if (strbuf_addf(&sb, " '+' Event occurrences may be lost due to branch counter saturated\n"))
+ goto err;
+
+ *str = strbuf_detach(&sb, NULL);
+
+ return 0;
+err:
+ strbuf_release(&sb);
+ return -ENOMEM;
+}
+
+int annotation_br_cntr_entry(char **str, int br_cntr_nr,
+ u64 *br_cntr, int num_aggr,
+ struct evsel *evsel)
+{
+ struct evsel *pos = evsel ? evlist__first(evsel->evlist) : NULL;
+ int i, j, avg, used;
+ struct strbuf sb;
+
+ strbuf_init(&sb, /*hint=*/ 0);
+ for (i = 0; i < br_cntr_nr; i++) {
+ used = 0;
+ avg = ceil((double)(br_cntr[i] & ~ANNOTATION__BR_CNTR_SATURATED_FLAG) /
+ (double)num_aggr);
+
+ if (strbuf_addch(&sb, '|'))
+ goto err;
+
+ if (!br_cntr[i]) {
+ if (strbuf_addch(&sb, '-'))
+ goto err;
+ used++;
+ } else {
+ evlist__for_each_entry_from(evsel->evlist, pos) {
+ if ((pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
+ (pos->br_cntr_idx == i))
+ break;
+ }
+ for (j = 0; j < avg; j++, used++) {
+ if (strbuf_addstr(&sb, pos->abbr_name))
+ goto err;
+ }
+
+ if (br_cntr[i] & ANNOTATION__BR_CNTR_SATURATED_FLAG) {
+ if (strbuf_addch(&sb, '+'))
+ goto err;
+ used++;
+ }
+ pos = list_next_entry(pos, core.node);
+ }
+
+ /* Assume the branch counter saturated at 3 */
+ for (j = used; j < 4; j++) {
+ if (strbuf_addch(&sb, ' '))
+ goto err;
+ }
+ }
+
+ if (strbuf_addch(&sb, br_cntr_nr ? '|' : ' '))
+ goto err;
+
+ *str = strbuf_detach(&sb, NULL);
+
+ return 0;
+err:
+ strbuf_release(&sb);
+ return -ENOMEM;
+}
+
static void __annotation_line__write(struct annotation_line *al, struct annotation *notes,
bool first_line, bool current_entry, bool change_color, int width,
void *obj, unsigned int percent_type,
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index f39dd5d7b05e..2ff79a389dc0 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -548,4 +548,7 @@ struct annotated_basic_block {
int annotate_get_basic_blocks(struct symbol *sym, s64 src, s64 dst,
struct list_head *head);
+int annotation_br_cntr_entry(char **str, int br_cntr_nr, u64 *br_cntr,
+ int num_aggr, struct evsel *evsel);
+int annotation_br_cntr_abbr_list(char **str, struct evsel *evsel, bool header);
#endif /* __PERF_ANNOTATE_H */
diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
index 04068d48683f..649392bee7ed 100644
--- a/tools/perf/util/block-info.c
+++ b/tools/perf/util/block-info.c
@@ -40,16 +40,32 @@ static struct block_header_column {
[PERF_HPP_REPORT__BLOCK_DSO] = {
.name = "Shared Object",
.width = 20,
+ },
+ [PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER] = {
+ .name = "Branch Counter",
+ .width = 30,
}
};
-struct block_info *block_info__new(void)
+static struct block_info *block_info__new(unsigned int br_cntr_nr)
{
- return zalloc(sizeof(struct block_info));
+ struct block_info *bi = zalloc(sizeof(struct block_info));
+
+ if (bi && br_cntr_nr) {
+ bi->br_cntr = calloc(br_cntr_nr, sizeof(u64));
+ if (!bi->br_cntr) {
+ free(bi);
+ return NULL;
+ }
+ }
+
+ return bi;
}
void block_info__delete(struct block_info *bi)
{
+ if (bi)
+ free(bi->br_cntr);
free(bi);
}
@@ -86,7 +102,8 @@ int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
static void init_block_info(struct block_info *bi, struct symbol *sym,
struct cyc_hist *ch, int offset,
- u64 total_cycles)
+ u64 total_cycles, unsigned int br_cntr_nr,
+ u64 *br_cntr, struct evsel *evsel)
{
bi->sym = sym;
bi->start = ch->start;
@@ -99,10 +116,18 @@ static void init_block_info(struct block_info *bi, struct symbol *sym,
memcpy(bi->cycles_spark, ch->cycles_spark,
NUM_SPARKS * sizeof(u64));
+
+ if (br_cntr && br_cntr_nr) {
+ bi->br_cntr_nr = br_cntr_nr;
+ memcpy(bi->br_cntr, &br_cntr[offset * br_cntr_nr],
+ br_cntr_nr * sizeof(u64));
+ }
+ bi->evsel = evsel;
}
int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
- u64 *block_cycles_aggr, u64 total_cycles)
+ u64 *block_cycles_aggr, u64 total_cycles,
+ unsigned int br_cntr_nr)
{
struct annotation *notes;
struct cyc_hist *ch;
@@ -125,12 +150,14 @@ int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
struct block_info *bi;
struct hist_entry *he_block;
- bi = block_info__new();
+ bi = block_info__new(br_cntr_nr);
if (!bi)
return -1;
init_block_info(bi, he->ms.sym, &ch[i], i,
- total_cycles);
+ total_cycles, br_cntr_nr,
+ notes->branch->br_cntr,
+ hists_to_evsel(he->hists));
cycles += bi->cycles_aggr / bi->num_aggr;
he_block = hists__add_entry_block(&bh->block_hists,
@@ -327,6 +354,24 @@ static void init_block_header(struct block_fmt *block_fmt)
fmt->width = block_column_width;
}
+static int block_branch_counter_entry(struct perf_hpp_fmt *fmt,
+ struct perf_hpp *hpp,
+ struct hist_entry *he)
+{
+ struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
+ struct block_info *bi = he->block_info;
+ char *buf;
+ int ret;
+
+ if (annotation_br_cntr_entry(&buf, bi->br_cntr_nr, bi->br_cntr,
+ bi->num_aggr, bi->evsel))
+ return 0;
+
+ ret = scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
+ free(buf);
+ return ret;
+}
+
static void hpp_register(struct block_fmt *block_fmt, int idx,
struct perf_hpp_list *hpp_list)
{
@@ -357,6 +402,9 @@ static void hpp_register(struct block_fmt *block_fmt, int idx,
case PERF_HPP_REPORT__BLOCK_DSO:
fmt->entry = block_dso_entry;
break;
+ case PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER:
+ fmt->entry = block_branch_counter_entry;
+ break;
default:
return;
}
@@ -390,7 +438,7 @@ static void init_block_hist(struct block_hist *bh, struct block_fmt *block_fmts,
static int process_block_report(struct hists *hists,
struct block_report *block_report,
u64 total_cycles, int *block_hpps,
- int nr_hpps)
+ int nr_hpps, unsigned int br_cntr_nr)
{
struct rb_node *next = rb_first_cached(&hists->entries);
struct block_hist *bh = &block_report->hist;
@@ -405,7 +453,7 @@ static int process_block_report(struct hists *hists,
while (next) {
he = rb_entry(next, struct hist_entry, rb_node);
block_info__process_sym(he, bh, &block_report->cycles,
- total_cycles);
+ total_cycles, br_cntr_nr);
next = rb_next(&he->rb_node);
}
@@ -435,7 +483,7 @@ struct block_report *block_info__create_report(struct evlist *evlist,
struct hists *hists = evsel__hists(pos);
process_block_report(hists, &block_reports[i], total_cycles,
- block_hpps, nr_hpps);
+ block_hpps, nr_hpps, evlist->nr_br_cntr);
i++;
}
diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h
index 0b9e1aad4c55..b9329dc3ab59 100644
--- a/tools/perf/util/block-info.h
+++ b/tools/perf/util/block-info.h
@@ -18,6 +18,9 @@ struct block_info {
u64 total_cycles;
int num;
int num_aggr;
+ int br_cntr_nr;
+ u64 *br_cntr;
+ struct evsel *evsel;
};
struct block_fmt {
@@ -36,6 +39,7 @@ enum {
PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
PERF_HPP_REPORT__BLOCK_RANGE,
PERF_HPP_REPORT__BLOCK_DSO,
+ PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER,
PERF_HPP_REPORT__BLOCK_MAX_INDEX
};
@@ -46,7 +50,6 @@ struct block_report {
int nr_fmts;
};
-struct block_info *block_info__new(void);
void block_info__delete(struct block_info *bi);
int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right);
@@ -55,7 +58,8 @@ int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
struct hist_entry *left, struct hist_entry *right);
int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
- u64 *block_cycles_aggr, u64 total_cycles);
+ u64 *block_cycles_aggr, u64 total_cycles,
+ unsigned int br_cntr_nr);
struct block_report *block_info__create_report(struct evlist *evlist,
u64 total_cycles,
--
2.38.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH 6/9] perf report: Display the branch counter histogram
2024-07-03 20:03 ` [PATCH 6/9] perf report: Display the branch counter histogram kan.liang
@ 2024-08-03 0:18 ` Namhyung Kim
2024-08-06 14:39 ` Liang, Kan
0 siblings, 1 reply; 26+ messages in thread
From: Namhyung Kim @ 2024-08-03 0:18 UTC (permalink / raw)
To: kan.liang
Cc: acme, irogers, peterz, mingo, linux-kernel, adrian.hunter, ak,
eranian
On Wed, Jul 3, 2024 at 1:03 PM <kan.liang@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Reusing the existing --total-cycles option to display the branch
> counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
> the logged branch counter events. They are shown right after all the
> cycle-related annotations.
> Extend the struct block_info to store and pass the branch counter
> related information.
>
> The annotation_br_cntr_entry() is to print the histogram of each branch
> counter event.
> The annotation_br_cntr_abbr_list() prints the branch counter's
> abbreviation list. Press 'B' to display the list in the TUI mode.
>
> $perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
> $perf report --total-cycles --stdio
>
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
> # Event count (approx.): 1610046
> #
> # Branch counter abbr list:
> # branch-instructions:ppp = A
> # branch-misses = B
> # '-' No event occurs
> # '+' Event occurrences may be lost due to branch counter saturated
> #
> # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
> # ............... .............. ........... .......... ...................... ..................
> #
> 57.55% 2.5M 0.00% 3 |A |- | ...
> 25.27% 1.1M 0.00% 2 |AA |- | ...
> 15.61% 667.2K 0.00% 1 |A |- | ...
> 0.16% 6.9K 0.81% 575 |A |- | ...
> 0.16% 6.8K 1.38% 977 |AA |- | ...
> 0.16% 6.8K 0.04% 28 |AA |B | ...
> 0.15% 6.6K 1.33% 946 |A |- | ...
> 0.11% 4.5K 0.06% 46 |AAA+|- | ...
> 0.10% 4.4K 0.88% 624 |A |- | ...
> 0.09% 3.7K 0.74% 524 |AAA+|B | ...
I think this format assumes short width and might not work
well when it has more events with bigger width. Maybe
A=<n>, B=<n> ?
Thanks,
Namhyung
>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
> tools/perf/Documentation/perf-report.txt | 1 +
> tools/perf/builtin-diff.c | 4 +-
> tools/perf/builtin-report.c | 20 ++++-
> tools/perf/ui/browsers/hists.c | 17 +++-
> tools/perf/util/annotate.c | 101 +++++++++++++++++++++++
> tools/perf/util/annotate.h | 3 +
> tools/perf/util/block-info.c | 66 +++++++++++++--
> tools/perf/util/block-info.h | 8 +-
> 8 files changed, 202 insertions(+), 18 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
> index d2b1593ef700..f35189d5ff1e 100644
> --- a/tools/perf/Documentation/perf-report.txt
> +++ b/tools/perf/Documentation/perf-report.txt
> @@ -614,6 +614,7 @@ include::itrace.txt[]
> 'Avg Cycles%' - block average sampled cycles / sum of total block average
> sampled cycles
> 'Avg Cycles' - block average sampled cycles
> + 'Branch Counter' - block branch counter histogram
>
> --skip-empty::
> Do not print 0 results in the --stat output.
> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
> index 2d9226b1de52..de24892dc7b8 100644
> --- a/tools/perf/builtin-diff.c
> +++ b/tools/perf/builtin-diff.c
> @@ -705,7 +705,7 @@ static void hists__precompute(struct hists *hists)
> if (compute == COMPUTE_CYCLES) {
> bh = container_of(he, struct block_hist, he);
> init_block_hist(bh);
> - block_info__process_sym(he, bh, NULL, 0);
> + block_info__process_sym(he, bh, NULL, 0, 0);
> }
>
> data__for_each_file_new(i, d) {
> @@ -728,7 +728,7 @@ static void hists__precompute(struct hists *hists)
> pair_bh = container_of(pair, struct block_hist,
> he);
> init_block_hist(pair_bh);
> - block_info__process_sym(pair, pair_bh, NULL, 0);
> + block_info__process_sym(pair, pair_bh, NULL, 0, 0);
>
> bh = container_of(he, struct block_hist, he);
>
> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
> index da8d13bbb500..a0f864f2e996 100644
> --- a/tools/perf/builtin-report.c
> +++ b/tools/perf/builtin-report.c
> @@ -575,6 +575,13 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
> hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
>
> if (rep->total_cycles_mode) {
> + char *buf;
> +
> + if (!annotation_br_cntr_abbr_list(&buf, pos, true)) {
> + fprintf(stdout, "%s", buf);
> + fprintf(stdout, "#\n");
> + free(buf);
> + }
> report__browse_block_hists(&rep->block_reports[i - 1].hist,
> rep->min_percent, pos, NULL);
> continue;
> @@ -1121,18 +1128,23 @@ static int __cmd_report(struct report *rep)
> report__output_resort(rep);
>
> if (rep->total_cycles_mode) {
> - int block_hpps[6] = {
> + int nr_hpps = 4;
> + int block_hpps[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
> PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT,
> PERF_HPP_REPORT__BLOCK_LBR_CYCLES,
> PERF_HPP_REPORT__BLOCK_CYCLES_PCT,
> PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
> - PERF_HPP_REPORT__BLOCK_RANGE,
> - PERF_HPP_REPORT__BLOCK_DSO,
> };
>
> + if (session->evlist->nr_br_cntr > 0)
> + block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER;
> +
> + block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_RANGE;
> + block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_DSO;
> +
> rep->block_reports = block_info__create_report(session->evlist,
> rep->total_cycles,
> - block_hpps, 6,
> + block_hpps, nr_hpps,
> &rep->nr_block_reports);
> if (!rep->block_reports)
> return -1;
> diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
> index b7219df51236..73d766eac75b 100644
> --- a/tools/perf/ui/browsers/hists.c
> +++ b/tools/perf/ui/browsers/hists.c
> @@ -3684,8 +3684,10 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
> struct hist_browser *browser;
> int key = -1;
> struct popup_action action;
> + char *br_cntr_text = NULL;
> static const char help[] =
> - " q Quit \n";
> + " q Quit \n"
> + " B Branch counter abbr list (Optional)\n";
>
> browser = hist_browser__new(hists);
> if (!browser)
> @@ -3703,6 +3705,8 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
>
> memset(&action, 0, sizeof(action));
>
> + annotation_br_cntr_abbr_list(&br_cntr_text, evsel, false);
> +
> while (1) {
> key = hist_browser__run(browser, "? - help", true, 0);
>
> @@ -3723,6 +3727,16 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
> action.ms.sym = browser->selection->sym;
> do_annotate(browser, &action);
> continue;
> + case 'B':
> + if (br_cntr_text) {
> + ui__question_window("Branch counter abbr list",
> + br_cntr_text, "Press any key...", 0);
> + } else {
> + ui__question_window("Branch counter abbr list",
> + "\n The branch counter is not available.\n",
> + "Press any key...", 0);
> + }
> + continue;
> default:
> break;
> }
> @@ -3730,5 +3744,6 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
>
> out:
> hist_browser__delete(browser);
> + free(br_cntr_text);
> return 0;
> }
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 6baa0671598e..f20f9e40ef0d 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -40,6 +40,7 @@
> #include "namespaces.h"
> #include "thread.h"
> #include "hashmap.h"
> +#include "strbuf.h"
> #include <regex.h>
> #include <linux/bitops.h>
> #include <linux/kernel.h>
> @@ -47,6 +48,7 @@
> #include <linux/zalloc.h>
> #include <subcmd/parse-options.h>
> #include <subcmd/run-command.h>
> +#include <math.h>
>
> /* FIXME: For the HE_COLORSET */
> #include "ui/browser.h"
> @@ -1706,6 +1708,105 @@ static void ipc_coverage_string(char *bf, int size, struct annotation *notes)
> ipc, coverage);
> }
>
> +int annotation_br_cntr_abbr_list(char **str, struct evsel *evsel, bool header)
> +{
> + struct evsel *pos;
> + struct strbuf sb;
> +
> + if (evsel->evlist->nr_br_cntr <= 0)
> + return -ENOTSUP;
> +
> + strbuf_init(&sb, /*hint=*/ 0);
> +
> + if (header && strbuf_addf(&sb, "# Branch counter abbr list:\n"))
> + goto err;
> +
> + evlist__for_each_entry(evsel->evlist, pos) {
> + if (!(pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS))
> + continue;
> + if (header && strbuf_addf(&sb, "#"))
> + goto err;
> +
> + if (strbuf_addf(&sb, " %s = %s\n", pos->name, pos->abbr_name))
> + goto err;
> + }
> +
> + if (header && strbuf_addf(&sb, "#"))
> + goto err;
> + if (strbuf_addf(&sb, " '-' No event occurs\n"))
> + goto err;
> +
> + if (header && strbuf_addf(&sb, "#"))
> + goto err;
> + if (strbuf_addf(&sb, " '+' Event occurrences may be lost due to branch counter saturated\n"))
> + goto err;
> +
> + *str = strbuf_detach(&sb, NULL);
> +
> + return 0;
> +err:
> + strbuf_release(&sb);
> + return -ENOMEM;
> +}
> +
> +int annotation_br_cntr_entry(char **str, int br_cntr_nr,
> + u64 *br_cntr, int num_aggr,
> + struct evsel *evsel)
> +{
> + struct evsel *pos = evsel ? evlist__first(evsel->evlist) : NULL;
> + int i, j, avg, used;
> + struct strbuf sb;
> +
> + strbuf_init(&sb, /*hint=*/ 0);
> + for (i = 0; i < br_cntr_nr; i++) {
> + used = 0;
> + avg = ceil((double)(br_cntr[i] & ~ANNOTATION__BR_CNTR_SATURATED_FLAG) /
> + (double)num_aggr);
> +
> + if (strbuf_addch(&sb, '|'))
> + goto err;
> +
> + if (!br_cntr[i]) {
> + if (strbuf_addch(&sb, '-'))
> + goto err;
> + used++;
> + } else {
> + evlist__for_each_entry_from(evsel->evlist, pos) {
> + if ((pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
> + (pos->br_cntr_idx == i))
> + break;
> + }
> + for (j = 0; j < avg; j++, used++) {
> + if (strbuf_addstr(&sb, pos->abbr_name))
> + goto err;
> + }
> +
> + if (br_cntr[i] & ANNOTATION__BR_CNTR_SATURATED_FLAG) {
> + if (strbuf_addch(&sb, '+'))
> + goto err;
> + used++;
> + }
> + pos = list_next_entry(pos, core.node);
> + }
> +
> + /* Assume the branch counter saturated at 3 */
> + for (j = used; j < 4; j++) {
> + if (strbuf_addch(&sb, ' '))
> + goto err;
> + }
> + }
> +
> + if (strbuf_addch(&sb, br_cntr_nr ? '|' : ' '))
> + goto err;
> +
> + *str = strbuf_detach(&sb, NULL);
> +
> + return 0;
> +err:
> + strbuf_release(&sb);
> + return -ENOMEM;
> +}
> +
> static void __annotation_line__write(struct annotation_line *al, struct annotation *notes,
> bool first_line, bool current_entry, bool change_color, int width,
> void *obj, unsigned int percent_type,
> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
> index f39dd5d7b05e..2ff79a389dc0 100644
> --- a/tools/perf/util/annotate.h
> +++ b/tools/perf/util/annotate.h
> @@ -548,4 +548,7 @@ struct annotated_basic_block {
> int annotate_get_basic_blocks(struct symbol *sym, s64 src, s64 dst,
> struct list_head *head);
>
> +int annotation_br_cntr_entry(char **str, int br_cntr_nr, u64 *br_cntr,
> + int num_aggr, struct evsel *evsel);
> +int annotation_br_cntr_abbr_list(char **str, struct evsel *evsel, bool header);
> #endif /* __PERF_ANNOTATE_H */
> diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
> index 04068d48683f..649392bee7ed 100644
> --- a/tools/perf/util/block-info.c
> +++ b/tools/perf/util/block-info.c
> @@ -40,16 +40,32 @@ static struct block_header_column {
> [PERF_HPP_REPORT__BLOCK_DSO] = {
> .name = "Shared Object",
> .width = 20,
> + },
> + [PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER] = {
> + .name = "Branch Counter",
> + .width = 30,
> }
> };
>
> -struct block_info *block_info__new(void)
> +static struct block_info *block_info__new(unsigned int br_cntr_nr)
> {
> - return zalloc(sizeof(struct block_info));
> + struct block_info *bi = zalloc(sizeof(struct block_info));
> +
> + if (bi && br_cntr_nr) {
> + bi->br_cntr = calloc(br_cntr_nr, sizeof(u64));
> + if (!bi->br_cntr) {
> + free(bi);
> + return NULL;
> + }
> + }
> +
> + return bi;
> }
>
> void block_info__delete(struct block_info *bi)
> {
> + if (bi)
> + free(bi->br_cntr);
> free(bi);
> }
>
> @@ -86,7 +102,8 @@ int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
>
> static void init_block_info(struct block_info *bi, struct symbol *sym,
> struct cyc_hist *ch, int offset,
> - u64 total_cycles)
> + u64 total_cycles, unsigned int br_cntr_nr,
> + u64 *br_cntr, struct evsel *evsel)
> {
> bi->sym = sym;
> bi->start = ch->start;
> @@ -99,10 +116,18 @@ static void init_block_info(struct block_info *bi, struct symbol *sym,
>
> memcpy(bi->cycles_spark, ch->cycles_spark,
> NUM_SPARKS * sizeof(u64));
> +
> + if (br_cntr && br_cntr_nr) {
> + bi->br_cntr_nr = br_cntr_nr;
> + memcpy(bi->br_cntr, &br_cntr[offset * br_cntr_nr],
> + br_cntr_nr * sizeof(u64));
> + }
> + bi->evsel = evsel;
> }
>
> int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
> - u64 *block_cycles_aggr, u64 total_cycles)
> + u64 *block_cycles_aggr, u64 total_cycles,
> + unsigned int br_cntr_nr)
> {
> struct annotation *notes;
> struct cyc_hist *ch;
> @@ -125,12 +150,14 @@ int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
> struct block_info *bi;
> struct hist_entry *he_block;
>
> - bi = block_info__new();
> + bi = block_info__new(br_cntr_nr);
> if (!bi)
> return -1;
>
> init_block_info(bi, he->ms.sym, &ch[i], i,
> - total_cycles);
> + total_cycles, br_cntr_nr,
> + notes->branch->br_cntr,
> + hists_to_evsel(he->hists));
> cycles += bi->cycles_aggr / bi->num_aggr;
>
> he_block = hists__add_entry_block(&bh->block_hists,
> @@ -327,6 +354,24 @@ static void init_block_header(struct block_fmt *block_fmt)
> fmt->width = block_column_width;
> }
>
> +static int block_branch_counter_entry(struct perf_hpp_fmt *fmt,
> + struct perf_hpp *hpp,
> + struct hist_entry *he)
> +{
> + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
> + struct block_info *bi = he->block_info;
> + char *buf;
> + int ret;
> +
> + if (annotation_br_cntr_entry(&buf, bi->br_cntr_nr, bi->br_cntr,
> + bi->num_aggr, bi->evsel))
> + return 0;
> +
> + ret = scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
> + free(buf);
> + return ret;
> +}
> +
> static void hpp_register(struct block_fmt *block_fmt, int idx,
> struct perf_hpp_list *hpp_list)
> {
> @@ -357,6 +402,9 @@ static void hpp_register(struct block_fmt *block_fmt, int idx,
> case PERF_HPP_REPORT__BLOCK_DSO:
> fmt->entry = block_dso_entry;
> break;
> + case PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER:
> + fmt->entry = block_branch_counter_entry;
> + break;
> default:
> return;
> }
> @@ -390,7 +438,7 @@ static void init_block_hist(struct block_hist *bh, struct block_fmt *block_fmts,
> static int process_block_report(struct hists *hists,
> struct block_report *block_report,
> u64 total_cycles, int *block_hpps,
> - int nr_hpps)
> + int nr_hpps, unsigned int br_cntr_nr)
> {
> struct rb_node *next = rb_first_cached(&hists->entries);
> struct block_hist *bh = &block_report->hist;
> @@ -405,7 +453,7 @@ static int process_block_report(struct hists *hists,
> while (next) {
> he = rb_entry(next, struct hist_entry, rb_node);
> block_info__process_sym(he, bh, &block_report->cycles,
> - total_cycles);
> + total_cycles, br_cntr_nr);
> next = rb_next(&he->rb_node);
> }
>
> @@ -435,7 +483,7 @@ struct block_report *block_info__create_report(struct evlist *evlist,
> struct hists *hists = evsel__hists(pos);
>
> process_block_report(hists, &block_reports[i], total_cycles,
> - block_hpps, nr_hpps);
> + block_hpps, nr_hpps, evlist->nr_br_cntr);
> i++;
> }
>
> diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h
> index 0b9e1aad4c55..b9329dc3ab59 100644
> --- a/tools/perf/util/block-info.h
> +++ b/tools/perf/util/block-info.h
> @@ -18,6 +18,9 @@ struct block_info {
> u64 total_cycles;
> int num;
> int num_aggr;
> + int br_cntr_nr;
> + u64 *br_cntr;
> + struct evsel *evsel;
> };
>
> struct block_fmt {
> @@ -36,6 +39,7 @@ enum {
> PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
> PERF_HPP_REPORT__BLOCK_RANGE,
> PERF_HPP_REPORT__BLOCK_DSO,
> + PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER,
> PERF_HPP_REPORT__BLOCK_MAX_INDEX
> };
>
> @@ -46,7 +50,6 @@ struct block_report {
> int nr_fmts;
> };
>
> -struct block_info *block_info__new(void);
> void block_info__delete(struct block_info *bi);
>
> int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right);
> @@ -55,7 +58,8 @@ int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
> struct hist_entry *left, struct hist_entry *right);
>
> int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
> - u64 *block_cycles_aggr, u64 total_cycles);
> + u64 *block_cycles_aggr, u64 total_cycles,
> + unsigned int br_cntr_nr);
>
> struct block_report *block_info__create_report(struct evlist *evlist,
> u64 total_cycles,
> --
> 2.38.1
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 6/9] perf report: Display the branch counter histogram
2024-08-03 0:18 ` Namhyung Kim
@ 2024-08-06 14:39 ` Liang, Kan
2024-08-06 23:29 ` Namhyung Kim
0 siblings, 1 reply; 26+ messages in thread
From: Liang, Kan @ 2024-08-06 14:39 UTC (permalink / raw)
To: Namhyung Kim
Cc: acme, irogers, peterz, mingo, linux-kernel, adrian.hunter, ak,
eranian
On 2024-08-02 8:18 p.m., Namhyung Kim wrote:
> On Wed, Jul 3, 2024 at 1:03 PM <kan.liang@linux.intel.com> wrote:
>>
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> Reusing the existing --total-cycles option to display the branch
>> counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
>> the logged branch counter events. They are shown right after all the
>> cycle-related annotations.
>> Extend the struct block_info to store and pass the branch counter
>> related information.
>>
>> The annotation_br_cntr_entry() is to print the histogram of each branch
>> counter event.
>> The annotation_br_cntr_abbr_list() prints the branch counter's
>> abbreviation list. Press 'B' to display the list in the TUI mode.
>>
>> $perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
>> $perf report --total-cycles --stdio
>>
>> # To display the perf.data header info, please use --header/--header-only options.
>> #
>> #
>> # Total Lost Samples: 0
>> #
>> # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
>> # Event count (approx.): 1610046
>> #
>> # Branch counter abbr list:
>> # branch-instructions:ppp = A
>> # branch-misses = B
>> # '-' No event occurs
>> # '+' Event occurrences may be lost due to branch counter saturated
>> #
>> # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
>> # ............... .............. ........... .......... ...................... ..................
>> #
>> 57.55% 2.5M 0.00% 3 |A |- | ...
>> 25.27% 1.1M 0.00% 2 |AA |- | ...
>> 15.61% 667.2K 0.00% 1 |A |- | ...
>> 0.16% 6.9K 0.81% 575 |A |- | ...
>> 0.16% 6.8K 1.38% 977 |AA |- | ...
>> 0.16% 6.8K 0.04% 28 |AA |B | ...
>> 0.15% 6.6K 1.33% 946 |A |- | ...
>> 0.11% 4.5K 0.06% 46 |AAA+|- | ...
>> 0.10% 4.4K 0.88% 624 |A |- | ...
>> 0.09% 3.7K 0.74% 524 |AAA+|B | ...
>
> I think this format assumes short width and might not work
> well when it has more events with bigger width. Maybe
> A=<n>, B=<n> ?
The purpose of "AAA" is to print a histogram here which can give the end
user a straightforward image of the distribution. The A=<n> may not be
that obvious.
I don't think there is a plan to increase the saturation of the counter.
So 4 bits of width should last for a long time. Other ARCHs don't have
such a feature either. I think I can the change the code to force the 4
bits of width now. For more that 3 events, the perf tool can convert it
to a "+". We may update the perf tool for a more specific histogram
later, if the saturation is changed. What do you think?
Thanks,
Kan
>
> Thanks,
> Namhyung
>
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>> tools/perf/Documentation/perf-report.txt | 1 +
>> tools/perf/builtin-diff.c | 4 +-
>> tools/perf/builtin-report.c | 20 ++++-
>> tools/perf/ui/browsers/hists.c | 17 +++-
>> tools/perf/util/annotate.c | 101 +++++++++++++++++++++++
>> tools/perf/util/annotate.h | 3 +
>> tools/perf/util/block-info.c | 66 +++++++++++++--
>> tools/perf/util/block-info.h | 8 +-
>> 8 files changed, 202 insertions(+), 18 deletions(-)
>>
>> diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
>> index d2b1593ef700..f35189d5ff1e 100644
>> --- a/tools/perf/Documentation/perf-report.txt
>> +++ b/tools/perf/Documentation/perf-report.txt
>> @@ -614,6 +614,7 @@ include::itrace.txt[]
>> 'Avg Cycles%' - block average sampled cycles / sum of total block average
>> sampled cycles
>> 'Avg Cycles' - block average sampled cycles
>> + 'Branch Counter' - block branch counter histogram
>>
>> --skip-empty::
>> Do not print 0 results in the --stat output.
>> diff --git a/tools/perf/builtin-diff.c b/tools/perf/builtin-diff.c
>> index 2d9226b1de52..de24892dc7b8 100644
>> --- a/tools/perf/builtin-diff.c
>> +++ b/tools/perf/builtin-diff.c
>> @@ -705,7 +705,7 @@ static void hists__precompute(struct hists *hists)
>> if (compute == COMPUTE_CYCLES) {
>> bh = container_of(he, struct block_hist, he);
>> init_block_hist(bh);
>> - block_info__process_sym(he, bh, NULL, 0);
>> + block_info__process_sym(he, bh, NULL, 0, 0);
>> }
>>
>> data__for_each_file_new(i, d) {
>> @@ -728,7 +728,7 @@ static void hists__precompute(struct hists *hists)
>> pair_bh = container_of(pair, struct block_hist,
>> he);
>> init_block_hist(pair_bh);
>> - block_info__process_sym(pair, pair_bh, NULL, 0);
>> + block_info__process_sym(pair, pair_bh, NULL, 0, 0);
>>
>> bh = container_of(he, struct block_hist, he);
>>
>> diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
>> index da8d13bbb500..a0f864f2e996 100644
>> --- a/tools/perf/builtin-report.c
>> +++ b/tools/perf/builtin-report.c
>> @@ -575,6 +575,13 @@ static int evlist__tty_browse_hists(struct evlist *evlist, struct report *rep, c
>> hists__fprintf_nr_sample_events(hists, rep, evname, stdout);
>>
>> if (rep->total_cycles_mode) {
>> + char *buf;
>> +
>> + if (!annotation_br_cntr_abbr_list(&buf, pos, true)) {
>> + fprintf(stdout, "%s", buf);
>> + fprintf(stdout, "#\n");
>> + free(buf);
>> + }
>> report__browse_block_hists(&rep->block_reports[i - 1].hist,
>> rep->min_percent, pos, NULL);
>> continue;
>> @@ -1121,18 +1128,23 @@ static int __cmd_report(struct report *rep)
>> report__output_resort(rep);
>>
>> if (rep->total_cycles_mode) {
>> - int block_hpps[6] = {
>> + int nr_hpps = 4;
>> + int block_hpps[PERF_HPP_REPORT__BLOCK_MAX_INDEX] = {
>> PERF_HPP_REPORT__BLOCK_TOTAL_CYCLES_PCT,
>> PERF_HPP_REPORT__BLOCK_LBR_CYCLES,
>> PERF_HPP_REPORT__BLOCK_CYCLES_PCT,
>> PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
>> - PERF_HPP_REPORT__BLOCK_RANGE,
>> - PERF_HPP_REPORT__BLOCK_DSO,
>> };
>>
>> + if (session->evlist->nr_br_cntr > 0)
>> + block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER;
>> +
>> + block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_RANGE;
>> + block_hpps[nr_hpps++] = PERF_HPP_REPORT__BLOCK_DSO;
>> +
>> rep->block_reports = block_info__create_report(session->evlist,
>> rep->total_cycles,
>> - block_hpps, 6,
>> + block_hpps, nr_hpps,
>> &rep->nr_block_reports);
>> if (!rep->block_reports)
>> return -1;
>> diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
>> index b7219df51236..73d766eac75b 100644
>> --- a/tools/perf/ui/browsers/hists.c
>> +++ b/tools/perf/ui/browsers/hists.c
>> @@ -3684,8 +3684,10 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
>> struct hist_browser *browser;
>> int key = -1;
>> struct popup_action action;
>> + char *br_cntr_text = NULL;
>> static const char help[] =
>> - " q Quit \n";
>> + " q Quit \n"
>> + " B Branch counter abbr list (Optional)\n";
>>
>> browser = hist_browser__new(hists);
>> if (!browser)
>> @@ -3703,6 +3705,8 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
>>
>> memset(&action, 0, sizeof(action));
>>
>> + annotation_br_cntr_abbr_list(&br_cntr_text, evsel, false);
>> +
>> while (1) {
>> key = hist_browser__run(browser, "? - help", true, 0);
>>
>> @@ -3723,6 +3727,16 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
>> action.ms.sym = browser->selection->sym;
>> do_annotate(browser, &action);
>> continue;
>> + case 'B':
>> + if (br_cntr_text) {
>> + ui__question_window("Branch counter abbr list",
>> + br_cntr_text, "Press any key...", 0);
>> + } else {
>> + ui__question_window("Branch counter abbr list",
>> + "\n The branch counter is not available.\n",
>> + "Press any key...", 0);
>> + }
>> + continue;
>> default:
>> break;
>> }
>> @@ -3730,5 +3744,6 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
>>
>> out:
>> hist_browser__delete(browser);
>> + free(br_cntr_text);
>> return 0;
>> }
>> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
>> index 6baa0671598e..f20f9e40ef0d 100644
>> --- a/tools/perf/util/annotate.c
>> +++ b/tools/perf/util/annotate.c
>> @@ -40,6 +40,7 @@
>> #include "namespaces.h"
>> #include "thread.h"
>> #include "hashmap.h"
>> +#include "strbuf.h"
>> #include <regex.h>
>> #include <linux/bitops.h>
>> #include <linux/kernel.h>
>> @@ -47,6 +48,7 @@
>> #include <linux/zalloc.h>
>> #include <subcmd/parse-options.h>
>> #include <subcmd/run-command.h>
>> +#include <math.h>
>>
>> /* FIXME: For the HE_COLORSET */
>> #include "ui/browser.h"
>> @@ -1706,6 +1708,105 @@ static void ipc_coverage_string(char *bf, int size, struct annotation *notes)
>> ipc, coverage);
>> }
>>
>> +int annotation_br_cntr_abbr_list(char **str, struct evsel *evsel, bool header)
>> +{
>> + struct evsel *pos;
>> + struct strbuf sb;
>> +
>> + if (evsel->evlist->nr_br_cntr <= 0)
>> + return -ENOTSUP;
>> +
>> + strbuf_init(&sb, /*hint=*/ 0);
>> +
>> + if (header && strbuf_addf(&sb, "# Branch counter abbr list:\n"))
>> + goto err;
>> +
>> + evlist__for_each_entry(evsel->evlist, pos) {
>> + if (!(pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS))
>> + continue;
>> + if (header && strbuf_addf(&sb, "#"))
>> + goto err;
>> +
>> + if (strbuf_addf(&sb, " %s = %s\n", pos->name, pos->abbr_name))
>> + goto err;
>> + }
>> +
>> + if (header && strbuf_addf(&sb, "#"))
>> + goto err;
>> + if (strbuf_addf(&sb, " '-' No event occurs\n"))
>> + goto err;
>> +
>> + if (header && strbuf_addf(&sb, "#"))
>> + goto err;
>> + if (strbuf_addf(&sb, " '+' Event occurrences may be lost due to branch counter saturated\n"))
>> + goto err;
>> +
>> + *str = strbuf_detach(&sb, NULL);
>> +
>> + return 0;
>> +err:
>> + strbuf_release(&sb);
>> + return -ENOMEM;
>> +}
>> +
>> +int annotation_br_cntr_entry(char **str, int br_cntr_nr,
>> + u64 *br_cntr, int num_aggr,
>> + struct evsel *evsel)
>> +{
>> + struct evsel *pos = evsel ? evlist__first(evsel->evlist) : NULL;
>> + int i, j, avg, used;
>> + struct strbuf sb;
>> +
>> + strbuf_init(&sb, /*hint=*/ 0);
>> + for (i = 0; i < br_cntr_nr; i++) {
>> + used = 0;
>> + avg = ceil((double)(br_cntr[i] & ~ANNOTATION__BR_CNTR_SATURATED_FLAG) /
>> + (double)num_aggr);
>> +
>> + if (strbuf_addch(&sb, '|'))
>> + goto err;
>> +
>> + if (!br_cntr[i]) {
>> + if (strbuf_addch(&sb, '-'))
>> + goto err;
>> + used++;
>> + } else {
>> + evlist__for_each_entry_from(evsel->evlist, pos) {
>> + if ((pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS) &&
>> + (pos->br_cntr_idx == i))
>> + break;
>> + }
>> + for (j = 0; j < avg; j++, used++) {
>> + if (strbuf_addstr(&sb, pos->abbr_name))
>> + goto err;
>> + }
>> +
>> + if (br_cntr[i] & ANNOTATION__BR_CNTR_SATURATED_FLAG) {
>> + if (strbuf_addch(&sb, '+'))
>> + goto err;
>> + used++;
>> + }
>> + pos = list_next_entry(pos, core.node);
>> + }
>> +
>> + /* Assume the branch counter saturated at 3 */
>> + for (j = used; j < 4; j++) {
>> + if (strbuf_addch(&sb, ' '))
>> + goto err;
>> + }
>> + }
>> +
>> + if (strbuf_addch(&sb, br_cntr_nr ? '|' : ' '))
>> + goto err;
>> +
>> + *str = strbuf_detach(&sb, NULL);
>> +
>> + return 0;
>> +err:
>> + strbuf_release(&sb);
>> + return -ENOMEM;
>> +}
>> +
>> static void __annotation_line__write(struct annotation_line *al, struct annotation *notes,
>> bool first_line, bool current_entry, bool change_color, int width,
>> void *obj, unsigned int percent_type,
>> diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
>> index f39dd5d7b05e..2ff79a389dc0 100644
>> --- a/tools/perf/util/annotate.h
>> +++ b/tools/perf/util/annotate.h
>> @@ -548,4 +548,7 @@ struct annotated_basic_block {
>> int annotate_get_basic_blocks(struct symbol *sym, s64 src, s64 dst,
>> struct list_head *head);
>>
>> +int annotation_br_cntr_entry(char **str, int br_cntr_nr, u64 *br_cntr,
>> + int num_aggr, struct evsel *evsel);
>> +int annotation_br_cntr_abbr_list(char **str, struct evsel *evsel, bool header);
>> #endif /* __PERF_ANNOTATE_H */
>> diff --git a/tools/perf/util/block-info.c b/tools/perf/util/block-info.c
>> index 04068d48683f..649392bee7ed 100644
>> --- a/tools/perf/util/block-info.c
>> +++ b/tools/perf/util/block-info.c
>> @@ -40,16 +40,32 @@ static struct block_header_column {
>> [PERF_HPP_REPORT__BLOCK_DSO] = {
>> .name = "Shared Object",
>> .width = 20,
>> + },
>> + [PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER] = {
>> + .name = "Branch Counter",
>> + .width = 30,
>> }
>> };
>>
>> -struct block_info *block_info__new(void)
>> +static struct block_info *block_info__new(unsigned int br_cntr_nr)
>> {
>> - return zalloc(sizeof(struct block_info));
>> + struct block_info *bi = zalloc(sizeof(struct block_info));
>> +
>> + if (bi && br_cntr_nr) {
>> + bi->br_cntr = calloc(br_cntr_nr, sizeof(u64));
>> + if (!bi->br_cntr) {
>> + free(bi);
>> + return NULL;
>> + }
>> + }
>> +
>> + return bi;
>> }
>>
>> void block_info__delete(struct block_info *bi)
>> {
>> + if (bi)
>> + free(bi->br_cntr);
>> free(bi);
>> }
>>
>> @@ -86,7 +102,8 @@ int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
>>
>> static void init_block_info(struct block_info *bi, struct symbol *sym,
>> struct cyc_hist *ch, int offset,
>> - u64 total_cycles)
>> + u64 total_cycles, unsigned int br_cntr_nr,
>> + u64 *br_cntr, struct evsel *evsel)
>> {
>> bi->sym = sym;
>> bi->start = ch->start;
>> @@ -99,10 +116,18 @@ static void init_block_info(struct block_info *bi, struct symbol *sym,
>>
>> memcpy(bi->cycles_spark, ch->cycles_spark,
>> NUM_SPARKS * sizeof(u64));
>> +
>> + if (br_cntr && br_cntr_nr) {
>> + bi->br_cntr_nr = br_cntr_nr;
>> + memcpy(bi->br_cntr, &br_cntr[offset * br_cntr_nr],
>> + br_cntr_nr * sizeof(u64));
>> + }
>> + bi->evsel = evsel;
>> }
>>
>> int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
>> - u64 *block_cycles_aggr, u64 total_cycles)
>> + u64 *block_cycles_aggr, u64 total_cycles,
>> + unsigned int br_cntr_nr)
>> {
>> struct annotation *notes;
>> struct cyc_hist *ch;
>> @@ -125,12 +150,14 @@ int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
>> struct block_info *bi;
>> struct hist_entry *he_block;
>>
>> - bi = block_info__new();
>> + bi = block_info__new(br_cntr_nr);
>> if (!bi)
>> return -1;
>>
>> init_block_info(bi, he->ms.sym, &ch[i], i,
>> - total_cycles);
>> + total_cycles, br_cntr_nr,
>> + notes->branch->br_cntr,
>> + hists_to_evsel(he->hists));
>> cycles += bi->cycles_aggr / bi->num_aggr;
>>
>> he_block = hists__add_entry_block(&bh->block_hists,
>> @@ -327,6 +354,24 @@ static void init_block_header(struct block_fmt *block_fmt)
>> fmt->width = block_column_width;
>> }
>>
>> +static int block_branch_counter_entry(struct perf_hpp_fmt *fmt,
>> + struct perf_hpp *hpp,
>> + struct hist_entry *he)
>> +{
>> + struct block_fmt *block_fmt = container_of(fmt, struct block_fmt, fmt);
>> + struct block_info *bi = he->block_info;
>> + char *buf;
>> + int ret;
>> +
>> + if (annotation_br_cntr_entry(&buf, bi->br_cntr_nr, bi->br_cntr,
>> + bi->num_aggr, bi->evsel))
>> + return 0;
>> +
>> + ret = scnprintf(hpp->buf, hpp->size, "%*s", block_fmt->width, buf);
>> + free(buf);
>> + return ret;
>> +}
>> +
>> static void hpp_register(struct block_fmt *block_fmt, int idx,
>> struct perf_hpp_list *hpp_list)
>> {
>> @@ -357,6 +402,9 @@ static void hpp_register(struct block_fmt *block_fmt, int idx,
>> case PERF_HPP_REPORT__BLOCK_DSO:
>> fmt->entry = block_dso_entry;
>> break;
>> + case PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER:
>> + fmt->entry = block_branch_counter_entry;
>> + break;
>> default:
>> return;
>> }
>> @@ -390,7 +438,7 @@ static void init_block_hist(struct block_hist *bh, struct block_fmt *block_fmts,
>> static int process_block_report(struct hists *hists,
>> struct block_report *block_report,
>> u64 total_cycles, int *block_hpps,
>> - int nr_hpps)
>> + int nr_hpps, unsigned int br_cntr_nr)
>> {
>> struct rb_node *next = rb_first_cached(&hists->entries);
>> struct block_hist *bh = &block_report->hist;
>> @@ -405,7 +453,7 @@ static int process_block_report(struct hists *hists,
>> while (next) {
>> he = rb_entry(next, struct hist_entry, rb_node);
>> block_info__process_sym(he, bh, &block_report->cycles,
>> - total_cycles);
>> + total_cycles, br_cntr_nr);
>> next = rb_next(&he->rb_node);
>> }
>>
>> @@ -435,7 +483,7 @@ struct block_report *block_info__create_report(struct evlist *evlist,
>> struct hists *hists = evsel__hists(pos);
>>
>> process_block_report(hists, &block_reports[i], total_cycles,
>> - block_hpps, nr_hpps);
>> + block_hpps, nr_hpps, evlist->nr_br_cntr);
>> i++;
>> }
>>
>> diff --git a/tools/perf/util/block-info.h b/tools/perf/util/block-info.h
>> index 0b9e1aad4c55..b9329dc3ab59 100644
>> --- a/tools/perf/util/block-info.h
>> +++ b/tools/perf/util/block-info.h
>> @@ -18,6 +18,9 @@ struct block_info {
>> u64 total_cycles;
>> int num;
>> int num_aggr;
>> + int br_cntr_nr;
>> + u64 *br_cntr;
>> + struct evsel *evsel;
>> };
>>
>> struct block_fmt {
>> @@ -36,6 +39,7 @@ enum {
>> PERF_HPP_REPORT__BLOCK_AVG_CYCLES,
>> PERF_HPP_REPORT__BLOCK_RANGE,
>> PERF_HPP_REPORT__BLOCK_DSO,
>> + PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER,
>> PERF_HPP_REPORT__BLOCK_MAX_INDEX
>> };
>>
>> @@ -46,7 +50,6 @@ struct block_report {
>> int nr_fmts;
>> };
>>
>> -struct block_info *block_info__new(void);
>> void block_info__delete(struct block_info *bi);
>>
>> int64_t __block_info__cmp(struct hist_entry *left, struct hist_entry *right);
>> @@ -55,7 +58,8 @@ int64_t block_info__cmp(struct perf_hpp_fmt *fmt __maybe_unused,
>> struct hist_entry *left, struct hist_entry *right);
>>
>> int block_info__process_sym(struct hist_entry *he, struct block_hist *bh,
>> - u64 *block_cycles_aggr, u64 total_cycles);
>> + u64 *block_cycles_aggr, u64 total_cycles,
>> + unsigned int br_cntr_nr);
>>
>> struct block_report *block_info__create_report(struct evlist *evlist,
>> u64 total_cycles,
>> --
>> 2.38.1
>>
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 6/9] perf report: Display the branch counter histogram
2024-08-06 14:39 ` Liang, Kan
@ 2024-08-06 23:29 ` Namhyung Kim
2024-08-07 3:22 ` Andi Kleen
2024-08-07 11:57 ` Liang, Kan
0 siblings, 2 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-08-06 23:29 UTC (permalink / raw)
To: Liang, Kan
Cc: acme, irogers, peterz, mingo, linux-kernel, adrian.hunter, ak,
eranian
On Tue, Aug 6, 2024 at 7:40 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
>
>
> On 2024-08-02 8:18 p.m., Namhyung Kim wrote:
> > On Wed, Jul 3, 2024 at 1:03 PM <kan.liang@linux.intel.com> wrote:
> >>
> >> From: Kan Liang <kan.liang@linux.intel.com>
> >>
> >> Reusing the existing --total-cycles option to display the branch
> >> counters. Add a new PERF_HPP_REPORT__BLOCK_BRANCH_COUNTER to display
> >> the logged branch counter events. They are shown right after all the
> >> cycle-related annotations.
> >> Extend the struct block_info to store and pass the branch counter
> >> related information.
> >>
> >> The annotation_br_cntr_entry() is to print the histogram of each branch
> >> counter event.
> >> The annotation_br_cntr_abbr_list() prints the branch counter's
> >> abbreviation list. Press 'B' to display the list in the TUI mode.
> >>
> >> $perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
> >> $perf report --total-cycles --stdio
> >>
> >> # To display the perf.data header info, please use --header/--header-only options.
> >> #
> >> #
> >> # Total Lost Samples: 0
> >> #
> >> # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
> >> # Event count (approx.): 1610046
> >> #
> >> # Branch counter abbr list:
> >> # branch-instructions:ppp = A
> >> # branch-misses = B
> >> # '-' No event occurs
> >> # '+' Event occurrences may be lost due to branch counter saturated
> >> #
> >> # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
> >> # ............... .............. ........... .......... ...................... ..................
> >> #
> >> 57.55% 2.5M 0.00% 3 |A |- | ...
> >> 25.27% 1.1M 0.00% 2 |AA |- | ...
> >> 15.61% 667.2K 0.00% 1 |A |- | ...
> >> 0.16% 6.9K 0.81% 575 |A |- | ...
> >> 0.16% 6.8K 1.38% 977 |AA |- | ...
> >> 0.16% 6.8K 0.04% 28 |AA |B | ...
> >> 0.15% 6.6K 1.33% 946 |A |- | ...
> >> 0.11% 4.5K 0.06% 46 |AAA+|- | ...
> >> 0.10% 4.4K 0.88% 624 |A |- | ...
> >> 0.09% 3.7K 0.74% 524 |AAA+|B | ...
> >
> > I think this format assumes short width and might not work
> > well when it has more events with bigger width. Maybe
> > A=<n>, B=<n> ?
>
> The purpose of "AAA" is to print a histogram here which can give the end
> user a straightforward image of the distribution. The A=<n> may not be
> that obvious.
I understand your point. But I think we need to provide an easily
parse-able format at least for CSV output.
>
> I don't think there is a plan to increase the saturation of the counter.
> So 4 bits of width should last for a long time. Other ARCHs don't have
> such a feature either. I think I can the change the code to force the 4
> bits of width now. For more that 3 events, the perf tool can convert it
> to a "+". We may update the perf tool for a more specific histogram
> later, if the saturation is changed. What do you think?
Ok, 4 bits width is probably fine. How many events can a LBR entry
support? Maybe that's limited by the number of HW counters but
theoretically 64 / 4 = 16, right?
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 6/9] perf report: Display the branch counter histogram
2024-08-06 23:29 ` Namhyung Kim
@ 2024-08-07 3:22 ` Andi Kleen
2024-08-07 11:57 ` Liang, Kan
1 sibling, 0 replies; 26+ messages in thread
From: Andi Kleen @ 2024-08-07 3:22 UTC (permalink / raw)
To: Namhyung Kim
Cc: Liang, Kan, acme, irogers, peterz, mingo, linux-kernel,
adrian.hunter, eranian
> I understand your point. But I think we need to provide an easily
> parse-able format at least for CSV output.
It's easily parseable, e.g. in python:
>>> collections.Counter(re.findall(r'[A-Z][0-9]?', "AAAB"))
Counter({'A': 3, 'B': 1})
>
> >
> > I don't think there is a plan to increase the saturation of the counter.
> > So 4 bits of width should last for a long time. Other ARCHs don't have
> > such a feature either. I think I can the change the code to force the 4
> > bits of width now. For more that 3 events, the perf tool can convert it
> > to a "+". We may update the perf tool for a more specific histogram
> > later, if the saturation is changed. What do you think?
>
> Ok, 4 bits width is probably fine. How many events can a LBR entry
> support? Maybe that's limited by the number of HW counters but
> theoretically 64 / 4 = 16, right?
The MSR doesn't have that many free bits. It's limited to 4 events.
-Andi
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 6/9] perf report: Display the branch counter histogram
2024-08-06 23:29 ` Namhyung Kim
2024-08-07 3:22 ` Andi Kleen
@ 2024-08-07 11:57 ` Liang, Kan
1 sibling, 0 replies; 26+ messages in thread
From: Liang, Kan @ 2024-08-07 11:57 UTC (permalink / raw)
To: Namhyung Kim
Cc: acme, irogers, peterz, mingo, linux-kernel, adrian.hunter, ak,
eranian
Hi Namhyung,
On 2024-08-06 7:29 p.m., Namhyung Kim wrote:
>>>> 57.55% 2.5M 0.00% 3 |A |- | ...
>>>> 25.27% 1.1M 0.00% 2 |AA |- | ...
>>>> 15.61% 667.2K 0.00% 1 |A |- | ...
>>>> 0.16% 6.9K 0.81% 575 |A |- | ...
>>>> 0.16% 6.8K 1.38% 977 |AA |- | ...
>>>> 0.16% 6.8K 0.04% 28 |AA |B | ...
>>>> 0.15% 6.6K 1.33% 946 |A |- | ...
>>>> 0.11% 4.5K 0.06% 46 |AAA+|- | ...
>>>> 0.10% 4.4K 0.88% 624 |A |- | ...
>>>> 0.09% 3.7K 0.74% 524 |AAA+|B | ...
>>> I think this format assumes short width and might not work
>>> well when it has more events with bigger width. Maybe
>>> A=<n>, B=<n> ?
>> The purpose of "AAA" is to print a histogram here which can give the end
>> user a straightforward image of the distribution. The A=<n> may not be
>> that obvious.
> I understand your point. But I think we need to provide an easily
> parse-able format at least for CSV output.
I guess we may use a similar method of perf script in patch 8.
By default, the histogram will be output.
If an user want a number, -v should be used.
$perf report --total-cycles --stdio -v
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles
Branch Counter >
# ............... .............. ........... ..........
.............................. ..................>
#
4.61% 116.4K 0.00% 91
A=3+,B=1 >
4.28% 108.0K 0.00% 26
A=1 ,B=1 >
3.42% 86.4K 0.00% 81
A=3+,B=1 >
2.84% 71.6K 0.00% 50
A=3+,B=- >
2.65% 66.8K 0.00% 178
A=3+,B=1 [__lock_acq>
2.26% 57.1K 0.00% 44
A=2 ,B=- >
Without -v,
$perf report --total-cycles --stdio
# Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles
Branch Counter >
# ............... .............. ........... ..........
.............................. ..................>
#
4.61% 116.4K 0.00% 91
|AAA+|B+ | >
4.28% 108.0K 0.00% 26
|A |B | >
3.42% 86.4K 0.00% 81
|AAA+|B+ | >
2.84% 71.6K 0.00% 50
|AAA+|- | >
2.65% 66.8K 0.00% 178
|AAA+|B+ | [__lock_acq>
2.26% 57.1K 0.00% 44
|AA |- | >
What do you think?
Thanks,
Kan
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 7/9] perf annotate: Display the branch counter histogram
2024-07-03 20:03 [PATCH 0/9] Support branch counters in block annotation kan.liang
` (5 preceding siblings ...)
2024-07-03 20:03 ` [PATCH 6/9] perf report: Display the branch counter histogram kan.liang
@ 2024-07-03 20:03 ` kan.liang
2024-08-02 21:09 ` Andi Kleen
2024-07-03 20:03 ` [PATCH 8/9] perf script: Add branch counters kan.liang
` (2 subsequent siblings)
9 siblings, 1 reply; 26+ messages in thread
From: kan.liang @ 2024-07-03 20:03 UTC (permalink / raw)
To: acme, namhyung, irogers, peterz, mingo, linux-kernel
Cc: adrian.hunter, ak, eranian, Kan Liang, Tinghao Zhang
From: Kan Liang <kan.liang@linux.intel.com>
Display the branch counter histogram in the annotation view.
Press 'B' to display the branch counter's abbreviation list as well.
Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
4000 Hz, Event count (approx.):
f3 /home/sdp/test/tchain_edit [Percent: local period]
Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
│ 0000000000401755 <f3>:
0.00 0.00 │ endbr64
│ push %rbp
│ mov %rsp,%rbp
│ movl $0x0,-0x4(%rbp)
0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
11.03 11.03 │ 11: mov -0x4(%rbp),%eax
│ and $0x1,%eax
│ test %eax,%eax
17.13 17.13 │2.41 1 |A |- | ↓ je 21
│ addl $0x1,-0x4(%rbp)
21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
11.03 11.03 │0.61 3 |A |- | ↑ jle 11
│ nop
│ pop %rbp
0.00 0.00 │0.24 20 |AA |B | ← ret
Originally-by: Tinghao Zhang <tinghao.zhang@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/builtin-annotate.c | 10 +++++---
tools/perf/ui/browsers/annotate.c | 18 ++++++++++++--
tools/perf/ui/browsers/hists.c | 3 ++-
tools/perf/util/annotate.c | 40 ++++++++++++++++++++++++++++---
tools/perf/util/annotate.h | 11 +++++++++
tools/perf/util/disasm.c | 1 +
6 files changed, 74 insertions(+), 9 deletions(-)
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 0aa40588425c..57c9c863dce9 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -917,11 +917,15 @@ int cmd_annotate(int argc, const char **argv)
sort_order = "dso,symbol";
/*
- * Set SORT_MODE__BRANCH so that annotate display IPC/Cycle
- * if branch info is in perf data in TUI mode.
+ * Set SORT_MODE__BRANCH so that annotate displays IPC/Cycle and
+ * branch counters, if the corresponding branch info is available
+ * in the perf data in the TUI mode.
*/
- if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack)
+ if ((use_browser == 1 || annotate.use_stdio2) && annotate.has_br_stack) {
sort__mode = SORT_MODE__BRANCH;
+ if (annotate.session->evlist->nr_br_cntr > 0)
+ annotate_opts.show_br_cntr = true;
+ }
if (setup_sorting(NULL) < 0)
usage_with_options(annotate_usage, options);
diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c
index ea986430241e..868ea84d766b 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -156,6 +156,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser)
struct symbol *sym = ms->sym;
struct annotation *notes = symbol__annotation(sym);
u8 pcnt_width = annotation__pcnt_width(notes);
+ u8 cntr_width = annotation__br_cntr_width();
int width;
int diff = 0;
@@ -205,13 +206,13 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser)
ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS);
__ui_browser__line_arrow(browser,
- pcnt_width + 2 + notes->src->widths.addr + width,
+ pcnt_width + 2 + notes->src->widths.addr + width + cntr_width,
from, to);
diff = is_fused(ab, cursor);
if (diff > 0) {
ui_browser__mark_fused(browser,
- pcnt_width + 3 + notes->src->widths.addr + width,
+ pcnt_width + 3 + notes->src->widths.addr + width + cntr_width,
from - diff, diff, to > from);
}
}
@@ -714,6 +715,7 @@ static int annotate_browser__run(struct annotate_browser *browser,
struct annotation *notes = symbol__annotation(ms->sym);
const char *help = "Press 'h' for help on key bindings";
int delay_secs = hbt ? hbt->refresh : 0;
+ char *br_cntr_text = NULL;
char title[256];
int key;
@@ -730,6 +732,8 @@ static int annotate_browser__run(struct annotate_browser *browser,
nd = browser->curr_hot;
+ annotation_br_cntr_abbr_list(&br_cntr_text, evsel, false);
+
while (1) {
key = ui_browser__run(&browser->b, delay_secs);
@@ -796,6 +800,7 @@ static int annotate_browser__run(struct annotate_browser *browser,
"r Run available scripts\n"
"p Toggle percent type [local/global]\n"
"b Toggle percent base [period/hits]\n"
+ "B Branch counter abbr list (Optional)\n"
"? Search string backwards\n"
"f Toggle showing offsets to full address\n");
continue;
@@ -904,6 +909,14 @@ static int annotate_browser__run(struct annotate_browser *browser,
hists__scnprintf_title(hists, title, sizeof(title));
annotate_browser__show(&browser->b, title, help);
continue;
+ case 'B':
+ if (br_cntr_text)
+ ui_browser__help_window(&browser->b, br_cntr_text);
+ else {
+ ui_browser__help_window(&browser->b,
+ "\n The branch counter is not available.\n");
+ }
+ continue;
case 'f':
annotation__toggle_full_addr(notes, ms);
continue;
@@ -923,6 +936,7 @@ static int annotate_browser__run(struct annotate_browser *browser,
}
out:
ui_browser__hide(&browser->b);
+ free(br_cntr_text);
return key;
}
diff --git a/tools/perf/ui/browsers/hists.c b/tools/perf/ui/browsers/hists.c
index 73d766eac75b..6dc765e37788 100644
--- a/tools/perf/ui/browsers/hists.c
+++ b/tools/perf/ui/browsers/hists.c
@@ -3705,7 +3705,8 @@ int block_hists_tui_browse(struct block_hist *bh, struct evsel *evsel,
memset(&action, 0, sizeof(action));
- annotation_br_cntr_abbr_list(&br_cntr_text, evsel, false);
+ if (!annotation_br_cntr_abbr_list(&br_cntr_text, evsel, false))
+ annotate_opts.show_br_cntr = true;
while (1) {
key = hist_browser__run(browser, "? - help", true, 0);
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index f20f9e40ef0d..8a7024534469 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -500,8 +500,10 @@ static void annotation__count_and_fill(struct annotation *notes, u64 start, u64
}
}
-static int annotation__compute_ipc(struct annotation *notes, size_t size)
+static int annotation__compute_ipc(struct annotation *notes, size_t size,
+ struct evsel *evsel)
{
+ unsigned int br_cntr_nr = evsel->evlist->nr_br_cntr;
int err = 0;
s64 offset;
@@ -536,6 +538,20 @@ static int annotation__compute_ipc(struct annotation *notes, size_t size)
al->cycles->max = ch->cycles_max;
al->cycles->min = ch->cycles_min;
}
+ if (al && notes->branch->br_cntr) {
+ if (!al->br_cntr) {
+ al->br_cntr = calloc(br_cntr_nr, sizeof(u64));
+ if (!al->br_cntr) {
+ err = ENOMEM;
+ break;
+ }
+ }
+ al->num_aggr = ch->num_aggr;
+ al->br_cntr_nr = br_cntr_nr;
+ al->evsel = evsel;
+ memcpy(al->br_cntr, ¬es->branch->br_cntr[offset * br_cntr_nr],
+ br_cntr_nr * sizeof(u64));
+ }
}
}
@@ -547,8 +563,10 @@ static int annotation__compute_ipc(struct annotation *notes, size_t size)
struct annotation_line *al;
al = annotated_source__get_line(notes->src, offset);
- if (al)
+ if (al) {
zfree(&al->cycles);
+ zfree(&al->br_cntr);
+ }
}
}
}
@@ -1903,6 +1921,22 @@ static void __annotation_line__write(struct annotation_line *al, struct annotati
"Cycle(min/max)");
}
+ if (annotate_opts.show_br_cntr) {
+ if (show_title) {
+ obj__printf(obj, "%*s ",
+ ANNOTATION__BR_CNTR_WIDTH,
+ "Branch Counter");
+ } else {
+ char *buf;
+
+ if (!annotation_br_cntr_entry(&buf, al->br_cntr_nr, al->br_cntr,
+ al->num_aggr, al->evsel)) {
+ obj__printf(obj, "%*s ", ANNOTATION__BR_CNTR_WIDTH, buf);
+ free(buf);
+ }
+ }
+ }
+
if (show_title && !*al->line) {
ipc_coverage_string(bf, sizeof(bf), notes);
obj__printf(obj, "%*s", ANNOTATION__AVG_IPC_WIDTH, bf);
@@ -2002,7 +2036,7 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
annotation__set_index(notes);
annotation__mark_jump_targets(notes, sym);
- err = annotation__compute_ipc(notes, size);
+ err = annotation__compute_ipc(notes, size, evsel);
if (err)
return err;
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 2ff79a389dc0..5cda399ae52e 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -31,6 +31,7 @@ struct annotated_data_type;
#define ANNOTATION__CYCLES_WIDTH 6
#define ANNOTATION__MINMAX_CYCLES_WIDTH 19
#define ANNOTATION__AVG_IPC_WIDTH 36
+#define ANNOTATION__BR_CNTR_WIDTH 30
#define ANNOTATION_DUMMY_LEN 256
struct annotation_options {
@@ -44,6 +45,7 @@ struct annotation_options {
show_nr_jumps,
show_minmax_cycle,
show_asm_raw,
+ show_br_cntr,
annotate_src,
full_addr;
u8 offset_level;
@@ -104,6 +106,10 @@ struct annotation_line {
char *fileloc;
char *path;
struct cycles_info *cycles;
+ int num_aggr;
+ int br_cntr_nr;
+ u64 *br_cntr;
+ struct evsel *evsel;
int jump_sources;
u32 idx;
int idx_asm;
@@ -350,6 +356,11 @@ static inline bool annotation_line__filter(struct annotation_line *al)
return annotate_opts.hide_src_code && al->offset == -1;
}
+static inline u8 annotation__br_cntr_width(void)
+{
+ return annotate_opts.show_br_cntr ? ANNOTATION__BR_CNTR_WIDTH : 0;
+}
+
void annotation__update_column_widths(struct annotation *notes);
void annotation__toggle_full_addr(struct annotation *notes, struct map_symbol *ms);
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 72aec8f61b94..3663938ca234 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -857,6 +857,7 @@ static void annotation_line__exit(struct annotation_line *al)
zfree_srcline(&al->path);
zfree(&al->line);
zfree(&al->cycles);
+ zfree(&al->br_cntr);
}
static size_t disasm_line_size(int nr)
--
2.38.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH 7/9] perf annotate: Display the branch counter histogram
2024-07-03 20:03 ` [PATCH 7/9] perf annotate: " kan.liang
@ 2024-08-02 21:09 ` Andi Kleen
2024-08-06 14:42 ` Liang, Kan
0 siblings, 1 reply; 26+ messages in thread
From: Andi Kleen @ 2024-08-02 21:09 UTC (permalink / raw)
To: kan.liang
Cc: acme, namhyung, irogers, peterz, mingo, linux-kernel,
adrian.hunter, eranian, Tinghao Zhang
> Display the branch counter histogram in the annotation view.
>
> Press 'B' to display the branch counter's abbreviation list as well.
>
> Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
> 4000 Hz, Event count (approx.):
> f3 /home/sdp/test/tchain_edit [Percent: local period]
Can we output the abbreviation mappings here in the header too?
Otherwise it will be hard to use.
-Andi
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 7/9] perf annotate: Display the branch counter histogram
2024-08-02 21:09 ` Andi Kleen
@ 2024-08-06 14:42 ` Liang, Kan
2024-08-06 21:37 ` Liang, Kan
0 siblings, 1 reply; 26+ messages in thread
From: Liang, Kan @ 2024-08-06 14:42 UTC (permalink / raw)
To: Andi Kleen
Cc: acme, namhyung, irogers, peterz, mingo, linux-kernel,
adrian.hunter, eranian, Tinghao Zhang
On 2024-08-02 5:09 p.m., Andi Kleen wrote:
>> Display the branch counter histogram in the annotation view.
>>
>> Press 'B' to display the branch counter's abbreviation list as well.
>>
>> Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
>> 4000 Hz, Event count (approx.):
>> f3 /home/sdp/test/tchain_edit [Percent: local period]
>
> Can we output the abbreviation mappings here in the header too?
> Otherwise it will be hard to use.
If so, the 'B' will be redundant. I will remove the 'B' and move the
abbreviation mappings in the header.
Thanks,
Kan
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 7/9] perf annotate: Display the branch counter histogram
2024-08-06 14:42 ` Liang, Kan
@ 2024-08-06 21:37 ` Liang, Kan
2024-08-06 23:02 ` Andi Kleen
0 siblings, 1 reply; 26+ messages in thread
From: Liang, Kan @ 2024-08-06 21:37 UTC (permalink / raw)
To: Andi Kleen
Cc: acme, namhyung, irogers, peterz, mingo, linux-kernel,
adrian.hunter, eranian
Hi Andi,
On 2024-08-06 10:42 a.m., Liang, Kan wrote:
>
>
> On 2024-08-02 5:09 p.m., Andi Kleen wrote:
>>> Display the branch counter histogram in the annotation view.
>>>
>>> Press 'B' to display the branch counter's abbreviation list as well.
>>>
>>> Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
>>> 4000 Hz, Event count (approx.):
>>> f3 /home/sdp/test/tchain_edit [Percent: local period]
>>
>> Can we output the abbreviation mappings here in the header too?
>> Otherwise it will be hard to use.
>
> If so, the 'B' will be redundant. I will remove the 'B' and move the
> abbreviation mappings in the header.
>
Actually, the output here is in the TUI mode, not --stdio mode.
There is only one single title line for the TUI mode.
It's filled out quickly. As you can see in the example, the number of
the "Event count (approx.)" is missed as well. The abbreviation mappings
will never get a chance to be output.
For the TUI mode, usually shortcut keys are used to display aux
information. The 'B' in this patch follows the existing behavior.
For the --stdio mode, perf should print out the abbreviation mappings in
the header. I think the --stdio mode is the one used by other tools to
parse the result, right? The previous patch 6 (--stdio mode) does show
everything in the header.
Is there a use-case in the TUI mode that has difficulties utilizing the
shortcut 'B'? If yes, could you please elaborate?
Thanks,
Kan
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 7/9] perf annotate: Display the branch counter histogram
2024-08-06 21:37 ` Liang, Kan
@ 2024-08-06 23:02 ` Andi Kleen
0 siblings, 0 replies; 26+ messages in thread
From: Andi Kleen @ 2024-08-06 23:02 UTC (permalink / raw)
To: Liang, Kan
Cc: acme, namhyung, irogers, peterz, mingo, linux-kernel,
adrian.hunter, eranian
> For the --stdio mode, perf should print out the abbreviation mappings in
> the header. I think the --stdio mode is the one used by other tools to
> parse the result, right?
It's not just for tools, the humans might also not know, especially
if there are lots of events.
> The previous patch 6 (--stdio mode) does show
> everything in the header.
>
> Is there a use-case in the TUI mode that has difficulties utilizing the
> shortcut 'B'? If yes, could you please elaborate?
No if B works in tui mode that's fine.
-Andi
^ permalink raw reply [flat|nested] 26+ messages in thread
* [PATCH 8/9] perf script: Add branch counters
2024-07-03 20:03 [PATCH 0/9] Support branch counters in block annotation kan.liang
` (6 preceding siblings ...)
2024-07-03 20:03 ` [PATCH 7/9] perf annotate: " kan.liang
@ 2024-07-03 20:03 ` kan.liang
2024-07-03 20:03 ` [PATCH 9/9] perf test: Add new test cases for the branch counter feature kan.liang
2024-07-31 15:05 ` [PATCH 0/9] Support branch counters in block annotation Arnaldo Carvalho de Melo
9 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2024-07-03 20:03 UTC (permalink / raw)
To: acme, namhyung, irogers, peterz, mingo, linux-kernel
Cc: adrian.hunter, ak, eranian, Kan Liang, Tinghao Zhang
From: Kan Liang <kan.liang@linux.intel.com>
It's useful to print the branch counter information for each jump in
the brstackinsn when it's available.
Add a new field brcntr to display the branch counter information.
By default, the abbreviation will be used to indicate the branch
counter. In the verbose mode, the real event name is shown.
$perf script -F +brstackinsn,+brcntr
# Branch counter abbr list:
# branch-instructions:ppp = A
# branch-misses = B
# '-' No event occurs
# '+' Event occurrences may be lost due to branch counter saturated
tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (home/sdp/test/tchain_edit)
f3+31:
0000000000401774 insn: eb 04 br_cntr: AA # PRED 5 cycles [5]
000000000040177a insn: 81 7d fc 0f 27 00 00
0000000000401781 insn: 7e e3 br_cntr: A # PRED 1 cycles [6] 2.00 IPC
0000000000401766 insn: 8b 45 fc
0000000000401769 insn: 83 e0 01
000000000040176c insn: 85 c0
000000000040176e insn: 74 06 br_cntr: A # PRED 1 cycles [7] 4.00 IPC
0000000000401776 insn: 83 45 fc 01
000000000040177a insn: 81 7d fc 0f 27 00 00
0000000000401781 insn: 7e e3 br_cntr: A # PRED 7 cycles [14] 0.43 IPC
$perf script -F +brstackinsn,+brcntr -v
tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (/home/sdp/os.linux.perf.test-suite/kernels/lbr_kernel/tchain_edit)
f3+31:
0000000000401774 insn: eb 04 br_cntr: branch-instructions:ppp 2 branch-misses 0 # PRED 5 cycles [5]
000000000040177a insn: 81 7d fc 0f 27 00 00
0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [6] 2.00 IPC
0000000000401766 insn: 8b 45 fc
0000000000401769 insn: 83 e0 01
000000000040176c insn: 85 c0
000000000040176e insn: 74 06 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [7] 4.00 IPC
0000000000401776 insn: 83 45 fc 01
000000000040177a insn: 81 7d fc 0f 27 00 00
0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 7 cycles [14] 0.43 IPC
Originally-by: Tinghao Zhang <tinghao.zhang@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/Documentation/perf-script.txt | 2 +-
tools/perf/builtin-script.c | 69 +++++++++++++++++++++---
2 files changed, 63 insertions(+), 8 deletions(-)
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index ff086ef05a0c..be483c904d8d 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -134,7 +134,7 @@ OPTIONS
srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
brstackinsn, brstackinsnlen, brstackdisasm, brstackoff, callindent, insn, disasm,
insnlen, synth, phys_addr, metric, misc, srccode, ipc, data_page_size,
- code_page_size, ins_lat, machine_pid, vcpu, cgroup, retire_lat,
+ code_page_size, ins_lat, machine_pid, vcpu, cgroup, retire_lat, brcntr,
Field list can be prepended with the type, trace, sw or hw,
to indicate to which event type the field list applies.
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c16224b1fef3..4d71847196bc 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -62,6 +62,7 @@
#include "util/record.h"
#include "util/util.h"
#include "util/cgroup.h"
+#include "util/annotate.h"
#include "perf.h"
#include <linux/ctype.h>
@@ -138,6 +139,7 @@ enum perf_output_field {
PERF_OUTPUT_DSOFF = 1ULL << 41,
PERF_OUTPUT_DISASM = 1ULL << 42,
PERF_OUTPUT_BRSTACKDISASM = 1ULL << 43,
+ PERF_OUTPUT_BRCNTR = 1ULL << 44,
};
struct perf_script {
@@ -213,6 +215,7 @@ struct output_option {
{.str = "cgroup", .field = PERF_OUTPUT_CGROUP},
{.str = "retire_lat", .field = PERF_OUTPUT_RETIRE_LAT},
{.str = "brstackdisasm", .field = PERF_OUTPUT_BRSTACKDISASM},
+ {.str = "brcntr", .field = PERF_OUTPUT_BRCNTR},
};
enum {
@@ -520,6 +523,12 @@ static int evsel__check_attr(struct evsel *evsel, struct perf_session *session)
"Hint: run 'perf record -b ...'\n");
return -EINVAL;
}
+ if (PRINT_FIELD(BRCNTR) &&
+ !(evlist__combined_branch_type(session->evlist) & PERF_SAMPLE_BRANCH_COUNTERS)) {
+ pr_err("Display of branch counter requested but it's not enabled\n"
+ "Hint: run 'perf record -j any,counter ...'\n");
+ return -EINVAL;
+ }
if ((PRINT_FIELD(PID) || PRINT_FIELD(TID)) &&
evsel__check_stype(evsel, PERF_SAMPLE_TID, "TID", PERF_OUTPUT_TID|PERF_OUTPUT_PID))
return -EINVAL;
@@ -789,6 +798,19 @@ static int perf_sample__fprintf_start(struct perf_script *script,
int printed = 0;
char tstr[128];
+ /*
+ * Print the branch counter's abbreviation list,
+ * if the branch counter is available.
+ */
+ if (PRINT_FIELD(BRCNTR) && !verbose) {
+ char *buf;
+
+ if (!annotation_br_cntr_abbr_list(&buf, evsel, true)) {
+ printed += fprintf(stdout, "%s", buf);
+ free(buf);
+ }
+ }
+
if (PRINT_FIELD(MACHINE_PID) && sample->machine_pid)
printed += fprintf(fp, "VM:%5d ", sample->machine_pid);
@@ -1195,7 +1217,9 @@ static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
struct perf_insn *x, u8 *inbuf, int len,
int insn, FILE *fp, int *total_cycles,
struct perf_event_attr *attr,
- struct thread *thread)
+ struct thread *thread,
+ struct evsel *evsel,
+ u64 br_cntr)
{
int ilen = 0;
int printed = fprintf(fp, "\t%016" PRIx64 "\t", ip);
@@ -1216,6 +1240,28 @@ static int ip__fprintf_jump(uint64_t ip, struct branch_entry *en,
addr_location__exit(&al);
}
+ if (PRINT_FIELD(BRCNTR)) {
+ unsigned int width = evsel__env(evsel)->br_cntr_width;
+ unsigned int i = 0, j, num, mask = (1L << width) - 1;
+ struct evsel *pos = evsel__leader(evsel);
+
+ printed += fprintf(fp, "br_cntr: ");
+ evlist__for_each_entry_from(evsel->evlist, pos) {
+ if (!(pos->core.attr.branch_sample_type & PERF_SAMPLE_BRANCH_COUNTERS))
+ continue;
+ if (evsel__leader(pos) != evsel__leader(evsel))
+ break;
+
+ num = (br_cntr >> (i++ * width)) & mask;
+ if (!verbose) {
+ for (j = 0; j < num; j++)
+ printed += fprintf(fp, "%s", pos->abbr_name);
+ } else
+ printed += fprintf(fp, "%s %d ", pos->name, num);
+ }
+ printed += fprintf(fp, "\t");
+ }
+
printed += fprintf(fp, "#%s%s%s%s",
en->flags.predicted ? " PRED" : "",
en->flags.mispred ? " MISPRED" : "",
@@ -1272,6 +1318,7 @@ static int ip__fprintf_sym(uint64_t addr, struct thread *thread,
}
static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
+ struct evsel *evsel,
struct thread *thread,
struct perf_event_attr *attr,
struct machine *machine, FILE *fp)
@@ -1285,6 +1332,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
unsigned off;
struct symbol *lastsym = NULL;
int total_cycles = 0;
+ u64 br_cntr = 0;
if (!(br && br->nr))
return 0;
@@ -1296,6 +1344,9 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
x.machine = machine;
x.cpu = sample->cpu;
+ if (PRINT_FIELD(BRCNTR) && sample->branch_stack_cntr)
+ br_cntr = sample->branch_stack_cntr[nr - 1];
+
printed += fprintf(fp, "%c", '\n');
/* Handle first from jump, of which we don't know the entry. */
@@ -1307,7 +1358,7 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
x.cpumode, x.cpu, &lastsym, attr, fp);
printed += ip__fprintf_jump(entries[nr - 1].from, &entries[nr - 1],
&x, buffer, len, 0, fp, &total_cycles,
- attr, thread);
+ attr, thread, evsel, br_cntr);
if (PRINT_FIELD(SRCCODE))
printed += print_srccode(thread, x.cpumode, entries[nr - 1].from);
}
@@ -1337,8 +1388,10 @@ static int perf_sample__fprintf_brstackinsn(struct perf_sample *sample,
printed += ip__fprintf_sym(ip, thread, x.cpumode, x.cpu, &lastsym, attr, fp);
if (ip == end) {
+ if (PRINT_FIELD(BRCNTR) && sample->branch_stack_cntr)
+ br_cntr = sample->branch_stack_cntr[i];
printed += ip__fprintf_jump(ip, &entries[i], &x, buffer + off, len - off, ++insn, fp,
- &total_cycles, attr, thread);
+ &total_cycles, attr, thread, evsel, br_cntr);
if (PRINT_FIELD(SRCCODE))
printed += print_srccode(thread, x.cpumode, ip);
break;
@@ -1547,6 +1600,7 @@ void script_fetch_insn(struct perf_sample *sample, struct thread *thread,
}
static int perf_sample__fprintf_insn(struct perf_sample *sample,
+ struct evsel *evsel,
struct perf_event_attr *attr,
struct thread *thread,
struct machine *machine, FILE *fp,
@@ -1567,7 +1621,7 @@ static int perf_sample__fprintf_insn(struct perf_sample *sample,
printed += sample__fprintf_insn_asm(sample, thread, machine, fp, al);
}
if (PRINT_FIELD(BRSTACKINSN) || PRINT_FIELD(BRSTACKINSNLEN) || PRINT_FIELD(BRSTACKDISASM))
- printed += perf_sample__fprintf_brstackinsn(sample, thread, attr, machine, fp);
+ printed += perf_sample__fprintf_brstackinsn(sample, evsel, thread, attr, machine, fp);
return printed;
}
@@ -1639,7 +1693,7 @@ static int perf_sample__fprintf_bts(struct perf_sample *sample,
if (print_srcline_last)
printed += map__fprintf_srcline(al->map, al->addr, "\n ", fp);
- printed += perf_sample__fprintf_insn(sample, attr, thread, machine, fp, al);
+ printed += perf_sample__fprintf_insn(sample, evsel, attr, thread, machine, fp, al);
printed += fprintf(fp, "\n");
if (PRINT_FIELD(SRCCODE)) {
int ret = map__fprintf_srccode(al->map, al->addr, stdout,
@@ -2297,7 +2351,7 @@ static void process_event(struct perf_script *script,
if (evsel__is_bpf_output(evsel) && PRINT_FIELD(BPF_OUTPUT))
perf_sample__fprintf_bpf_output(sample, fp);
- perf_sample__fprintf_insn(sample, attr, thread, machine, fp, al);
+ perf_sample__fprintf_insn(sample, evsel, attr, thread, machine, fp, al);
if (PRINT_FIELD(PHYS_ADDR))
fprintf(fp, "%16" PRIx64, sample->phys_addr);
@@ -3979,7 +4033,8 @@ int cmd_script(int argc, const char **argv)
"brstacksym,flags,data_src,weight,bpf-output,brstackinsn,"
"brstackinsnlen,brstackdisasm,brstackoff,callindent,insn,disasm,insnlen,synth,"
"phys_addr,metric,misc,srccode,ipc,tod,data_page_size,"
- "code_page_size,ins_lat,machine_pid,vcpu,cgroup,retire_lat",
+ "code_page_size,ins_lat,machine_pid,vcpu,cgroup,retire_lat,"
+ "brcntr",
parse_output_fields),
OPT_BOOLEAN('a', "all-cpus", &system_wide,
"system-wide collection from all CPUs"),
--
2.38.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* [PATCH 9/9] perf test: Add new test cases for the branch counter feature
2024-07-03 20:03 [PATCH 0/9] Support branch counters in block annotation kan.liang
` (7 preceding siblings ...)
2024-07-03 20:03 ` [PATCH 8/9] perf script: Add branch counters kan.liang
@ 2024-07-03 20:03 ` kan.liang
2024-07-31 15:05 ` [PATCH 0/9] Support branch counters in block annotation Arnaldo Carvalho de Melo
9 siblings, 0 replies; 26+ messages in thread
From: kan.liang @ 2024-07-03 20:03 UTC (permalink / raw)
To: acme, namhyung, irogers, peterz, mingo, linux-kernel
Cc: adrian.hunter, ak, eranian, Kan Liang
From: Kan Liang <kan.liang@linux.intel.com>
Enhance the test case for the branch counter feature.
Now, the test verifies
- The new filter can be successfully applied on the supported platforms.
- The counter value can be outputted via the perf report -D
- The counter value and the abbr name can be outputted via the
perf script (New)
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
tools/perf/tests/shell/record.sh | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
index 3d1a7759a7b2..7964ebd9007d 100755
--- a/tools/perf/tests/shell/record.sh
+++ b/tools/perf/tests/shell/record.sh
@@ -21,6 +21,7 @@ testprog="perf test -w thloop"
cpu_pmu_dir="/sys/bus/event_source/devices/cpu*"
br_cntr_file="/caps/branch_counter_nr"
br_cntr_output="branch stack counters"
+br_cntr_script_output="br_cntr: A"
cleanup() {
rm -rf "${perfdata}"
@@ -165,7 +166,7 @@ test_workload() {
}
test_branch_counter() {
- echo "Basic branch counter test"
+ echo "Branch counter test"
# Check if the branch counter feature is supported
for dir in $cpu_pmu_dir
do
@@ -175,19 +176,25 @@ test_branch_counter() {
return
fi
done
- if ! perf record -o "${perfdata}" -j any,counter ${testprog} 2> /dev/null
+ if ! perf record -o "${perfdata}" -e "{branches:p,instructions}" -j any,counter ${testprog} 2> /dev/null
then
- echo "Basic branch counter test [Failed record]"
+ echo "Branch counter record test [Failed record]"
err=1
return
fi
if ! perf report -i "${perfdata}" -D -q | grep -q "$br_cntr_output"
then
- echo "Basic branch record test [Failed missing output]"
+ echo "Branch counter report test [Failed missing output]"
err=1
return
fi
- echo "Basic branch counter test [Success]"
+ if ! perf script -i "${perfdata}" -F +brstackinsn,+brcntr | grep -q "$br_cntr_script_output"
+ then
+ echo " Branch counter script test [Failed missing output]"
+ err=1
+ return
+ fi
+ echo "Branch counter test [Success]"
}
test_per_thread
--
2.38.1
^ permalink raw reply related [flat|nested] 26+ messages in thread* Re: [PATCH 0/9] Support branch counters in block annotation
2024-07-03 20:03 [PATCH 0/9] Support branch counters in block annotation kan.liang
` (8 preceding siblings ...)
2024-07-03 20:03 ` [PATCH 9/9] perf test: Add new test cases for the branch counter feature kan.liang
@ 2024-07-31 15:05 ` Arnaldo Carvalho de Melo
2024-07-31 15:31 ` Liang, Kan
9 siblings, 1 reply; 26+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-07-31 15:05 UTC (permalink / raw)
To: kan.liang
Cc: namhyung, irogers, peterz, Andi Kleen, mingo, linux-kernel,
adrian.hunter, ak, eranian
On Wed, Jul 03, 2024 at 01:03:47PM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> The branch counters logging (A.K.A LBR event logging) introduces a
> per-counter indication of precise event occurrences in LBRs. It can
> provide a means to attribute exposed retirement latency to combinations
> of events across a block of instructions. It also provides a means of
> attributing Timed LBR latencies to events.
>
> The kernel support and basic perf tool support have been merged.
> https://lore.kernel.org/lkml/20231025201626.3000228-1-kan.liang@linux.intel.com/
>
> This series is to provide advanced perf tool support via adding the
> branch counters information in block annotation. It can further
> facilitate the analysis of branch blocks.
>
> The patch 1 and 2 are to fix two existing issues of --total-cycles and
> the branch counters feature.
>
> The patch 3-9 are the advanced perf tool support.
I couldn't find any newer versions of this series nor reviews, is that
right?
I'll try and review this soon, but if someone else could take a look,
try it and provide a Reviewed-by or at least an Acked-by, that would
help!
- Arnaldo
> Here are some examples.
>
> perf annotation:
>
> $perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
> $perf report --total-cycles --stdio
>
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
> # Event count (approx.): 1610046
> #
> # Branch counter abbr list:
> # branch-instructions:ppp = A
> # branch-misses = B
> # '-' No event occurs
> # '+' Event occurrences may be lost due to branch counter saturated
> #
> # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
> # ............... .............. ........... .......... ...................... ..
> #
> 57.55% 2.5M 0.00% 3 |A |- | ...
> 25.27% 1.1M 0.00% 2 |AA |- | ...
> 15.61% 667.2K 0.00% 1 |A |- | ...
> 0.16% 6.9K 0.81% 575 |A |- | ...
> 0.16% 6.8K 1.38% 977 |AA |- | ...
> 0.16% 6.8K 0.04% 28 |AA |B | ...
> 0.15% 6.6K 1.33% 946 |A |- | ...
> 0.11% 4.5K 0.06% 46 |AAA+|- | ...
>
> (The below output is in the TUI mode. Users can press 'B' to display
> the Branch counter abbr list.)
>
> Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
> 4000 Hz, Event count (approx.):
> f3 /home/sdp/test/tchain_edit [Percent: local period]
> Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
> │ 0000000000401755 <f3>:
> 0.00 0.00 │ endbr64
> │ push %rbp
> │ mov %rsp,%rbp
> │ movl $0x0,-0x4(%rbp)
> 0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
> 11.03 11.03 │ 11: mov -0x4(%rbp),%eax
> │ and $0x1,%eax
> │ test %eax,%eax
> 17.13 17.13 │2.41 1 |A |- | ↓ je 21
> │ addl $0x1,-0x4(%rbp)
> 21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
> 17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
> 21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
> 11.03 11.03 │0.61 3 |A |- | ↑ jle 11
> │ nop
> │ pop %rbp
> 0.00 0.00 │0.24 20 |AA |B | ← ret
>
> perf script:
>
> $perf script -F +brstackinsn,+brcntr
>
> # Branch counter abbr list:
> # branch-instructions:ppp = A
> # branch-misses = B
> # '-' No event occurs
> # '+' Event occurrences may be lost due to branch counter saturated
> tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (home/sdp/test/tchain_edit)
> f3+31:
> 0000000000401774 insn: eb 04 br_cntr: AA # PRED 5 cycles [5]
> 000000000040177a insn: 81 7d fc 0f 27 00 00
> 0000000000401781 insn: 7e e3 br_cntr: A # PRED 1 cycles [6] 2.00 IPC
> 0000000000401766 insn: 8b 45 fc
> 0000000000401769 insn: 83 e0 01
> 000000000040176c insn: 85 c0
>
> $perf script -F +brstackinsn,+brcntr -v
>
> tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (/home/sdp/test/tchain_edit)
> f3+31:
> 0000000000401774 insn: eb 04 br_cntr: branch-instructions:ppp 2 branch-misses 0 # PRED 5 cycles [5]
> 000000000040177a insn: 81 7d fc 0f 27 00 00
> 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [6] 2.00 IPC
> 0000000000401766 insn: 8b 45 fc
> 0000000000401769 insn: 83 e0 01
> 000000000040176c insn: 85 c0
>
> Kan Liang (9):
> perf report: Fix --total-cycles --stdio output error
> perf report: Remove the first overflow check for branch counters
> perf evlist: Save branch counters information
> perf annotate: Save branch counters for each block
> perf evsel: Assign abbr name for the branch counter events
> perf report: Display the branch counter histogram
> perf annotate: Display the branch counter histogram
> perf script: Add branch counters
> perf test: Add new test cases for the branch counter feature
>
> tools/perf/Documentation/perf-report.txt | 1 +
> tools/perf/Documentation/perf-script.txt | 2 +-
> tools/perf/builtin-annotate.c | 13 +-
> tools/perf/builtin-diff.c | 8 +-
> tools/perf/builtin-report.c | 25 ++-
> tools/perf/builtin-script.c | 69 +++++++-
> tools/perf/builtin-top.c | 4 +-
> tools/perf/tests/shell/record.sh | 17 +-
> tools/perf/ui/browsers/annotate.c | 18 +-
> tools/perf/ui/browsers/hists.c | 18 +-
> tools/perf/util/annotate.c | 209 +++++++++++++++++++++--
> tools/perf/util/annotate.h | 24 ++-
> tools/perf/util/block-info.c | 66 ++++++-
> tools/perf/util/block-info.h | 8 +-
> tools/perf/util/branch.h | 1 +
> tools/perf/util/disasm.c | 1 +
> tools/perf/util/evlist.c | 66 +++++++
> tools/perf/util/evlist.h | 2 +
> tools/perf/util/evsel.c | 15 +-
> tools/perf/util/evsel.h | 12 ++
> tools/perf/util/hist.c | 5 +-
> tools/perf/util/hist.h | 2 +-
> tools/perf/util/machine.c | 3 +
> 23 files changed, 519 insertions(+), 70 deletions(-)
>
> --
> 2.38.1
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 0/9] Support branch counters in block annotation
2024-07-31 15:05 ` [PATCH 0/9] Support branch counters in block annotation Arnaldo Carvalho de Melo
@ 2024-07-31 15:31 ` Liang, Kan
2024-07-31 17:00 ` Namhyung Kim
0 siblings, 1 reply; 26+ messages in thread
From: Liang, Kan @ 2024-07-31 15:31 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: namhyung, irogers, peterz, Andi Kleen, mingo, linux-kernel,
adrian.hunter, ak, eranian
Hi Arnaldo,
On 2024-07-31 11:05 a.m., Arnaldo Carvalho de Melo wrote:
> On Wed, Jul 03, 2024 at 01:03:47PM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> The branch counters logging (A.K.A LBR event logging) introduces a
>> per-counter indication of precise event occurrences in LBRs. It can
>> provide a means to attribute exposed retirement latency to combinations
>> of events across a block of instructions. It also provides a means of
>> attributing Timed LBR latencies to events.
>>
>> The kernel support and basic perf tool support have been merged.
>> https://lore.kernel.org/lkml/20231025201626.3000228-1-kan.liang@linux.intel.com/
>>
>> This series is to provide advanced perf tool support via adding the
>> branch counters information in block annotation. It can further
>> facilitate the analysis of branch blocks.
>>
>> The patch 1 and 2 are to fix two existing issues of --total-cycles and
>> the branch counters feature.
>>
>> The patch 3-9 are the advanced perf tool support.
>
> I couldn't find any newer versions of this series nor reviews, is that
> right?
Right. There is no newer version nor reviews.
The patch series can be successfully applied on top of the latest
tmp.perf-tools-next (on top of the commit 756785ab6380 ("perf list: Give
clues if failed to open tracing events directory")).
I think we can still use it for the review.
>
> I'll try and review this soon, but if someone else could take a look,
> try it and provide a Reviewed-by or at least an Acked-by, that would
> help!
Thanks!
Kan
>
> - Arnaldo
>
>> Here are some examples.
>>
>> perf annotation:
>>
>> $perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
>> $perf report --total-cycles --stdio
>>
>> # To display the perf.data header info, please use --header/--header-only options.
>> #
>> #
>> # Total Lost Samples: 0
>> #
>> # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
>> # Event count (approx.): 1610046
>> #
>> # Branch counter abbr list:
>> # branch-instructions:ppp = A
>> # branch-misses = B
>> # '-' No event occurs
>> # '+' Event occurrences may be lost due to branch counter saturated
>> #
>> # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
>> # ............... .............. ........... .......... ...................... ..
>> #
>> 57.55% 2.5M 0.00% 3 |A |- | ...
>> 25.27% 1.1M 0.00% 2 |AA |- | ...
>> 15.61% 667.2K 0.00% 1 |A |- | ...
>> 0.16% 6.9K 0.81% 575 |A |- | ...
>> 0.16% 6.8K 1.38% 977 |AA |- | ...
>> 0.16% 6.8K 0.04% 28 |AA |B | ...
>> 0.15% 6.6K 1.33% 946 |A |- | ...
>> 0.11% 4.5K 0.06% 46 |AAA+|- | ...
>>
>> (The below output is in the TUI mode. Users can press 'B' to display
>> the Branch counter abbr list.)
>>
>> Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
>> 4000 Hz, Event count (approx.):
>> f3 /home/sdp/test/tchain_edit [Percent: local period]
>> Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
>> │ 0000000000401755 <f3>:
>> 0.00 0.00 │ endbr64
>> │ push %rbp
>> │ mov %rsp,%rbp
>> │ movl $0x0,-0x4(%rbp)
>> 0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
>> 11.03 11.03 │ 11: mov -0x4(%rbp),%eax
>> │ and $0x1,%eax
>> │ test %eax,%eax
>> 17.13 17.13 │2.41 1 |A |- | ↓ je 21
>> │ addl $0x1,-0x4(%rbp)
>> 21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
>> 17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
>> 21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
>> 11.03 11.03 │0.61 3 |A |- | ↑ jle 11
>> │ nop
>> │ pop %rbp
>> 0.00 0.00 │0.24 20 |AA |B | ← ret
>>
>> perf script:
>>
>> $perf script -F +brstackinsn,+brcntr
>>
>> # Branch counter abbr list:
>> # branch-instructions:ppp = A
>> # branch-misses = B
>> # '-' No event occurs
>> # '+' Event occurrences may be lost due to branch counter saturated
>> tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (home/sdp/test/tchain_edit)
>> f3+31:
>> 0000000000401774 insn: eb 04 br_cntr: AA # PRED 5 cycles [5]
>> 000000000040177a insn: 81 7d fc 0f 27 00 00
>> 0000000000401781 insn: 7e e3 br_cntr: A # PRED 1 cycles [6] 2.00 IPC
>> 0000000000401766 insn: 8b 45 fc
>> 0000000000401769 insn: 83 e0 01
>> 000000000040176c insn: 85 c0
>>
>> $perf script -F +brstackinsn,+brcntr -v
>>
>> tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (/home/sdp/test/tchain_edit)
>> f3+31:
>> 0000000000401774 insn: eb 04 br_cntr: branch-instructions:ppp 2 branch-misses 0 # PRED 5 cycles [5]
>> 000000000040177a insn: 81 7d fc 0f 27 00 00
>> 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [6] 2.00 IPC
>> 0000000000401766 insn: 8b 45 fc
>> 0000000000401769 insn: 83 e0 01
>> 000000000040176c insn: 85 c0
>>
>> Kan Liang (9):
>> perf report: Fix --total-cycles --stdio output error
>> perf report: Remove the first overflow check for branch counters
>> perf evlist: Save branch counters information
>> perf annotate: Save branch counters for each block
>> perf evsel: Assign abbr name for the branch counter events
>> perf report: Display the branch counter histogram
>> perf annotate: Display the branch counter histogram
>> perf script: Add branch counters
>> perf test: Add new test cases for the branch counter feature
>>
>> tools/perf/Documentation/perf-report.txt | 1 +
>> tools/perf/Documentation/perf-script.txt | 2 +-
>> tools/perf/builtin-annotate.c | 13 +-
>> tools/perf/builtin-diff.c | 8 +-
>> tools/perf/builtin-report.c | 25 ++-
>> tools/perf/builtin-script.c | 69 +++++++-
>> tools/perf/builtin-top.c | 4 +-
>> tools/perf/tests/shell/record.sh | 17 +-
>> tools/perf/ui/browsers/annotate.c | 18 +-
>> tools/perf/ui/browsers/hists.c | 18 +-
>> tools/perf/util/annotate.c | 209 +++++++++++++++++++++--
>> tools/perf/util/annotate.h | 24 ++-
>> tools/perf/util/block-info.c | 66 ++++++-
>> tools/perf/util/block-info.h | 8 +-
>> tools/perf/util/branch.h | 1 +
>> tools/perf/util/disasm.c | 1 +
>> tools/perf/util/evlist.c | 66 +++++++
>> tools/perf/util/evlist.h | 2 +
>> tools/perf/util/evsel.c | 15 +-
>> tools/perf/util/evsel.h | 12 ++
>> tools/perf/util/hist.c | 5 +-
>> tools/perf/util/hist.h | 2 +-
>> tools/perf/util/machine.c | 3 +
>> 23 files changed, 519 insertions(+), 70 deletions(-)
>>
>> --
>> 2.38.1
>
^ permalink raw reply [flat|nested] 26+ messages in thread* Re: [PATCH 0/9] Support branch counters in block annotation
2024-07-31 15:31 ` Liang, Kan
@ 2024-07-31 17:00 ` Namhyung Kim
0 siblings, 0 replies; 26+ messages in thread
From: Namhyung Kim @ 2024-07-31 17:00 UTC (permalink / raw)
To: Liang, Kan
Cc: Arnaldo Carvalho de Melo, irogers, peterz, Andi Kleen, mingo,
linux-kernel, adrian.hunter, ak, eranian
Hi guys,
On Wed, Jul 31, 2024 at 8:31 AM Liang, Kan <kan.liang@linux.intel.com> wrote:
>
> Hi Arnaldo,
>
> On 2024-07-31 11:05 a.m., Arnaldo Carvalho de Melo wrote:
> > On Wed, Jul 03, 2024 at 01:03:47PM -0700, kan.liang@linux.intel.com wrote:
> >> From: Kan Liang <kan.liang@linux.intel.com>
> >>
> >> The branch counters logging (A.K.A LBR event logging) introduces a
> >> per-counter indication of precise event occurrences in LBRs. It can
> >> provide a means to attribute exposed retirement latency to combinations
> >> of events across a block of instructions. It also provides a means of
> >> attributing Timed LBR latencies to events.
> >>
> >> The kernel support and basic perf tool support have been merged.
> >> https://lore.kernel.org/lkml/20231025201626.3000228-1-kan.liang@linux.intel.com/
> >>
> >> This series is to provide advanced perf tool support via adding the
> >> branch counters information in block annotation. It can further
> >> facilitate the analysis of branch blocks.
> >>
> >> The patch 1 and 2 are to fix two existing issues of --total-cycles and
> >> the branch counters feature.
> >>
> >> The patch 3-9 are the advanced perf tool support.
> >
> > I couldn't find any newer versions of this series nor reviews, is that
> > right?
>
> Right. There is no newer version nor reviews.
>
> The patch series can be successfully applied on top of the latest
> tmp.perf-tools-next (on top of the commit 756785ab6380 ("perf list: Give
> clues if failed to open tracing events directory")).
>
> I think we can still use it for the review.
>
> >
> > I'll try and review this soon, but if someone else could take a look,
> > try it and provide a Reviewed-by or at least an Acked-by, that would
> > help!
Sure, I'll take a look!
Thanks,
Namhyung
> >
> >> Here are some examples.
> >>
> >> perf annotation:
> >>
> >> $perf record -e "{branch-instructions:ppp,branch-misses}:S" -j any,counter
> >> $perf report --total-cycles --stdio
> >>
> >> # To display the perf.data header info, please use --header/--header-only options.
> >> #
> >> #
> >> # Total Lost Samples: 0
> >> #
> >> # Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }'
> >> # Event count (approx.): 1610046
> >> #
> >> # Branch counter abbr list:
> >> # branch-instructions:ppp = A
> >> # branch-misses = B
> >> # '-' No event occurs
> >> # '+' Event occurrences may be lost due to branch counter saturated
> >> #
> >> # Sampled Cycles% Sampled Cycles Avg Cycles% Avg Cycles Branch Counter [Program Block Range]
> >> # ............... .............. ........... .......... ...................... ..
> >> #
> >> 57.55% 2.5M 0.00% 3 |A |- | ...
> >> 25.27% 1.1M 0.00% 2 |AA |- | ...
> >> 15.61% 667.2K 0.00% 1 |A |- | ...
> >> 0.16% 6.9K 0.81% 575 |A |- | ...
> >> 0.16% 6.8K 1.38% 977 |AA |- | ...
> >> 0.16% 6.8K 0.04% 28 |AA |B | ...
> >> 0.15% 6.6K 1.33% 946 |A |- | ...
> >> 0.11% 4.5K 0.06% 46 |AAA+|- | ...
> >>
> >> (The below output is in the TUI mode. Users can press 'B' to display
> >> the Branch counter abbr list.)
> >>
> >> Samples: 1M of events 'anon group { branch-instructions:ppp, branch-misses }',
> >> 4000 Hz, Event count (approx.):
> >> f3 /home/sdp/test/tchain_edit [Percent: local period]
> >> Percent │ IPC Cycle Branch Counter (Average IPC: 1.39, IPC Coverage: 29.4%)
> >> │ 0000000000401755 <f3>:
> >> 0.00 0.00 │ endbr64
> >> │ push %rbp
> >> │ mov %rsp,%rbp
> >> │ movl $0x0,-0x4(%rbp)
> >> 0.00 0.00 │1.33 3 |A |- | ↓ jmp 25
> >> 11.03 11.03 │ 11: mov -0x4(%rbp),%eax
> >> │ and $0x1,%eax
> >> │ test %eax,%eax
> >> 17.13 17.13 │2.41 1 |A |- | ↓ je 21
> >> │ addl $0x1,-0x4(%rbp)
> >> 21.84 21.84 │2.22 2 |AA |- | ↓ jmp 25
> >> 17.13 17.13 │ 21: addl $0x1,-0x4(%rbp)
> >> 21.84 21.84 │ 25: cmpl $0x270f,-0x4(%rbp)
> >> 11.03 11.03 │0.61 3 |A |- | ↑ jle 11
> >> │ nop
> >> │ pop %rbp
> >> 0.00 0.00 │0.24 20 |AA |B | ← ret
> >>
> >> perf script:
> >>
> >> $perf script -F +brstackinsn,+brcntr
> >>
> >> # Branch counter abbr list:
> >> # branch-instructions:ppp = A
> >> # branch-misses = B
> >> # '-' No event occurs
> >> # '+' Event occurrences may be lost due to branch counter saturated
> >> tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (home/sdp/test/tchain_edit)
> >> f3+31:
> >> 0000000000401774 insn: eb 04 br_cntr: AA # PRED 5 cycles [5]
> >> 000000000040177a insn: 81 7d fc 0f 27 00 00
> >> 0000000000401781 insn: 7e e3 br_cntr: A # PRED 1 cycles [6] 2.00 IPC
> >> 0000000000401766 insn: 8b 45 fc
> >> 0000000000401769 insn: 83 e0 01
> >> 000000000040176c insn: 85 c0
> >>
> >> $perf script -F +brstackinsn,+brcntr -v
> >>
> >> tchain_edit 332203 3366329.405674: 53030 branch-instructions:ppp: 401781 f3+0x2c (/home/sdp/test/tchain_edit)
> >> f3+31:
> >> 0000000000401774 insn: eb 04 br_cntr: branch-instructions:ppp 2 branch-misses 0 # PRED 5 cycles [5]
> >> 000000000040177a insn: 81 7d fc 0f 27 00 00
> >> 0000000000401781 insn: 7e e3 br_cntr: branch-instructions:ppp 1 branch-misses 0 # PRED 1 cycles [6] 2.00 IPC
> >> 0000000000401766 insn: 8b 45 fc
> >> 0000000000401769 insn: 83 e0 01
> >> 000000000040176c insn: 85 c0
> >>
> >> Kan Liang (9):
> >> perf report: Fix --total-cycles --stdio output error
> >> perf report: Remove the first overflow check for branch counters
> >> perf evlist: Save branch counters information
> >> perf annotate: Save branch counters for each block
> >> perf evsel: Assign abbr name for the branch counter events
> >> perf report: Display the branch counter histogram
> >> perf annotate: Display the branch counter histogram
> >> perf script: Add branch counters
> >> perf test: Add new test cases for the branch counter feature
> >>
> >> tools/perf/Documentation/perf-report.txt | 1 +
> >> tools/perf/Documentation/perf-script.txt | 2 +-
> >> tools/perf/builtin-annotate.c | 13 +-
> >> tools/perf/builtin-diff.c | 8 +-
> >> tools/perf/builtin-report.c | 25 ++-
> >> tools/perf/builtin-script.c | 69 +++++++-
> >> tools/perf/builtin-top.c | 4 +-
> >> tools/perf/tests/shell/record.sh | 17 +-
> >> tools/perf/ui/browsers/annotate.c | 18 +-
> >> tools/perf/ui/browsers/hists.c | 18 +-
> >> tools/perf/util/annotate.c | 209 +++++++++++++++++++++--
> >> tools/perf/util/annotate.h | 24 ++-
> >> tools/perf/util/block-info.c | 66 ++++++-
> >> tools/perf/util/block-info.h | 8 +-
> >> tools/perf/util/branch.h | 1 +
> >> tools/perf/util/disasm.c | 1 +
> >> tools/perf/util/evlist.c | 66 +++++++
> >> tools/perf/util/evlist.h | 2 +
> >> tools/perf/util/evsel.c | 15 +-
> >> tools/perf/util/evsel.h | 12 ++
> >> tools/perf/util/hist.c | 5 +-
> >> tools/perf/util/hist.h | 2 +-
> >> tools/perf/util/machine.c | 3 +
> >> 23 files changed, 519 insertions(+), 70 deletions(-)
> >>
> >> --
> >> 2.38.1
> >
^ permalink raw reply [flat|nested] 26+ messages in thread