* [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1)
@ 2024-08-03 21:13 Namhyung Kim
2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
` (5 more replies)
0 siblings, 6 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users
Hello,
This is to make perf annotate has the same behavior as perf report.
Especially in the TUI browser, we want to maintain the same experience
when it comes to display dummy events from perf report.
$ perf mem record -a -- perf test -w noploop
$ perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
dummy:u
Just using perf annotate with --group will show the all 3 events.
$ perf annotate --group --stdio | head
Percent | Source code & Disassembly of ...
--------------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 0.00 : e060: pushq %rbp
0.00 0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 0.00 : e064: pushq %r15
0.00 0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 0.00 : e069: pushq %r14
0.00 0.00 0.00 : e06b: pushq %r13
0.00 0.00 0.00 : e06d: movl %edx, %r13d
Now with --skip-empty, it'll hide the last dummy event.
$ perf annotate --group --stdio --skip-empty | head
Percent | Source code & Disassembly of ...
------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 : e060: pushq %rbp
0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 : e064: pushq %r15
0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 : e069: pushq %r14
0.00 0.00 : e06b: pushq %r13
0.00 0.00 : e06d: movl %edx, %r13d
The code is available in 'perf/annotate-skip-v1' branch at
git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
Thanks,
Namhyung
Namhyung Kim (5):
perf annotate: Use al->data_nr if possible
perf annotate: Set notes->src->nr_events early
perf annotate: Use annotation__pcnt_width() consistently
perf annotate: Set al->data_nr using the notes->src->nr_events
perf annotate: Add --skip-empty option
tools/perf/Documentation/perf-annotate.txt | 3 ++
tools/perf/builtin-annotate.c | 2 +
tools/perf/util/annotate.c | 47 +++++++++++++---------
tools/perf/util/annotate.h | 2 +-
tools/perf/util/disasm.c | 6 +--
5 files changed, 35 insertions(+), 25 deletions(-)
--
2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 1/5] perf annotate: Use al->data_nr if possible
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
@ 2024-08-03 21:13 ` Namhyung Kim
2024-08-03 21:13 ` [PATCH 2/5] perf annotate: Set notes->src->nr_events early Namhyung Kim
` (4 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users
The data_nr keeps the number of entries in al->data[] so it should use
it when it iterates the array. The notes->src->nr_events should have
the same number but it'd be natural to use al->data_nr.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/annotate.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index a2ee4074f768..91ad948c89d5 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1594,13 +1594,12 @@ bool ui__has_annotation(void)
static double annotation_line__max_percent(struct annotation_line *al,
- struct annotation *notes,
unsigned int percent_type)
{
double percent_max = 0.0;
int i;
- for (i = 0; i < notes->src->nr_events; i++) {
+ for (i = 0; i < al->data_nr; i++) {
double percent;
percent = annotation_data__percent(&al->data[i],
@@ -1672,7 +1671,7 @@ static void __annotation_line__write(struct annotation_line *al, struct annotati
void (*obj__write_graph)(void *obj, int graph))
{
- double percent_max = annotation_line__max_percent(al, notes, percent_type);
+ double percent_max = annotation_line__max_percent(al, percent_type);
int pcnt_width = annotation__pcnt_width(notes),
cycles_width = annotation__cycles_width(notes);
bool show_title = false;
@@ -1690,7 +1689,7 @@ static void __annotation_line__write(struct annotation_line *al, struct annotati
if (al->offset != -1 && percent_max != 0.0) {
int i;
- for (i = 0; i < notes->src->nr_events; i++) {
+ for (i = 0; i < al->data_nr; i++) {
double percent;
percent = annotation_data__percent(&al->data[i], percent_type);
--
2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 2/5] perf annotate: Set notes->src->nr_events early
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
@ 2024-08-03 21:13 ` Namhyung Kim
2024-08-03 21:13 ` [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently Namhyung Kim
` (3 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users
We want to use it in different places so make sure it sets properly
in symbol__annotate() before creating the disasm lines.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/annotate.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 91ad948c89d5..09e6fdf344db 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -925,6 +925,11 @@ int symbol__annotate(struct map_symbol *ms, struct evsel *evsel,
return -1;
}
+ if (evsel__is_group_event(evsel))
+ notes->src->nr_events = evsel->core.nr_members;
+ else
+ notes->src->nr_events = 1;
+
if (annotate_opts.full_addr)
notes->src->start = map__objdump_2mem(ms->map, ms->sym->start);
else
@@ -1842,10 +1847,7 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
struct symbol *sym = ms->sym;
struct annotation *notes = symbol__annotation(sym);
size_t size = symbol__size(sym);
- int nr_pcnt = 1, err;
-
- if (evsel__is_group_event(evsel))
- nr_pcnt = evsel->core.nr_members;
+ int err;
err = symbol__annotate(ms, evsel, parch);
if (err)
@@ -1861,8 +1863,6 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
return err;
annotation__init_column_widths(notes, sym);
- notes->src->nr_events = nr_pcnt;
-
annotation__update_column_widths(notes);
sym->annotate2 = 1;
--
2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
2024-08-03 21:13 ` [PATCH 2/5] perf annotate: Set notes->src->nr_events early Namhyung Kim
@ 2024-08-03 21:13 ` Namhyung Kim
2024-08-03 21:13 ` [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events Namhyung Kim
` (2 subsequent siblings)
5 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users
The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly. Use this
function consistently so that we can make sure it has similar output
in different modes. But there's a difference in stdio and tui output:
stdio uses 8 and tui uses 7 for a percent.
Let's use 8 and adjust the print width in __annotation_line__write()
properly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/annotate.c | 14 +++++---------
tools/perf/util/annotate.h | 2 +-
2 files changed, 6 insertions(+), 10 deletions(-)
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 09e6fdf344db..917897fe44a2 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -699,13 +699,13 @@ annotation_line__print(struct annotation_line *al, struct symbol *sym, u64 start
int percent_type)
{
struct disasm_line *dl = container_of(al, struct disasm_line, al);
+ struct annotation *notes = symbol__annotation(sym);
static const char *prev_line;
if (al->offset != -1) {
double max_percent = 0.0;
int i, nr_percent = 1;
const char *color;
- struct annotation *notes = symbol__annotation(sym);
for (i = 0; i < al->data_nr; i++) {
double percent;
@@ -775,14 +775,11 @@ annotation_line__print(struct annotation_line *al, struct symbol *sym, u64 start
} else if (max_lines && printed >= max_lines)
return 1;
else {
- int width = symbol_conf.show_total_period ? 12 : 8;
+ int width = annotation__pcnt_width(notes);
if (queue)
return -1;
- if (evsel__is_group_event(evsel))
- width *= evsel->core.nr_members;
-
if (!*al->line)
printf(" %*s:\n", width, " ");
else
@@ -1111,7 +1108,7 @@ int symbol__annotate_printf(struct map_symbol *ms, struct evsel *evsel)
int more = 0;
bool context = opts->context;
u64 len;
- int width = symbol_conf.show_total_period ? 12 : 8;
+ int width = annotation__pcnt_width(notes);
int graph_dotted_len;
char buf[512];
@@ -1127,7 +1124,6 @@ int symbol__annotate_printf(struct map_symbol *ms, struct evsel *evsel)
len = symbol__size(sym);
if (evsel__is_group_event(evsel)) {
- width *= evsel->core.nr_members;
evsel__group_desc(evsel, buf, sizeof(buf));
evsel_name = buf;
}
@@ -1703,10 +1699,10 @@ static void __annotation_line__write(struct annotation_line *al, struct annotati
if (symbol_conf.show_total_period) {
obj__printf(obj, "%11" PRIu64 " ", al->data[i].he.period);
} else if (symbol_conf.show_nr_samples) {
- obj__printf(obj, "%6" PRIu64 " ",
+ obj__printf(obj, "%7" PRIu64 " ",
al->data[i].he.nr_samples);
} else {
- obj__printf(obj, "%6.2f ", percent);
+ obj__printf(obj, "%7.2f ", percent);
}
}
} else {
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 9ba772f46270..64e70d716ff1 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -339,7 +339,7 @@ static inline int annotation__cycles_width(struct annotation *notes)
static inline int annotation__pcnt_width(struct annotation *notes)
{
- return (symbol_conf.show_total_period ? 12 : 7) * notes->src->nr_events;
+ return (symbol_conf.show_total_period ? 12 : 8) * notes->src->nr_events;
}
static inline bool annotation_line__filter(struct annotation_line *al)
--
2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
` (2 preceding siblings ...)
2024-08-03 21:13 ` [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently Namhyung Kim
@ 2024-08-03 21:13 ` Namhyung Kim
2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
2024-08-05 19:26 ` [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Arnaldo Carvalho de Melo
5 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users
This is a preparation to support skipping empty events.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/util/disasm.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 85fb0cfedf94..22289003e16d 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -1037,10 +1037,8 @@ static size_t disasm_line_size(int nr)
struct disasm_line *disasm_line__new(struct annotate_args *args)
{
struct disasm_line *dl = NULL;
- int nr = 1;
-
- if (evsel__is_group_event(args->evsel))
- nr = args->evsel->core.nr_members;
+ struct annotation *notes = symbol__annotation(args->ms.sym);
+ int nr = notes->src->nr_events;
dl = zalloc(disasm_line_size(nr));
if (!dl)
--
2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 5/5] perf annotate: Add --skip-empty option
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
` (3 preceding siblings ...)
2024-08-03 21:13 ` [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events Namhyung Kim
@ 2024-08-03 21:13 ` Namhyung Kim
2024-08-05 19:22 ` Arnaldo Carvalho de Melo
2024-08-05 19:26 ` [PATCH 5/5] perf annotate: Add --skip-empty option Arnaldo Carvalho de Melo
2024-08-05 19:26 ` [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Arnaldo Carvalho de Melo
5 siblings, 2 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users
Like in perf report, we want to hide empty events in the perf annotate
output. This is consistent when the option is set in perf report.
For example, the following command would use 3 events including dummy.
$ perf mem record -a -- perf test -w noploop
$ perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
dummy:u
Just using perf annotate with --group will show the all 3 events.
$ perf annotate --group --stdio | head
Percent | Source code & Disassembly of ...
--------------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 0.00 : e060: pushq %rbp
0.00 0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 0.00 : e064: pushq %r15
0.00 0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 0.00 : e069: pushq %r14
0.00 0.00 0.00 : e06b: pushq %r13
0.00 0.00 0.00 : e06d: movl %edx, %r13d
Now with --skip-empty, it'll hide the last dummy event.
$ perf annotate --group --stdio --skip-empty | head
Percent | Source code & Disassembly of ...
------------------------------------------------------
: 0 0xe060 <_dl_relocate_object>:
0.00 0.00 : e060: pushq %rbp
0.00 0.00 : e061: movq %rsp, %rbp
0.00 0.00 : e064: pushq %r15
0.00 0.00 : e066: movq %rdi, %r15
0.00 0.00 : e069: pushq %r14
0.00 0.00 : e06b: pushq %r13
0.00 0.00 : e06d: movl %edx, %r13d
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/Documentation/perf-annotate.txt | 3 +++
tools/perf/builtin-annotate.c | 2 ++
tools/perf/util/annotate.c | 22 +++++++++++++++++-----
3 files changed, 22 insertions(+), 5 deletions(-)
diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt
index b95524bea021..156c5f37b051 100644
--- a/tools/perf/Documentation/perf-annotate.txt
+++ b/tools/perf/Documentation/perf-annotate.txt
@@ -165,6 +165,9 @@ include::itrace.txt[]
--type-stat::
Show stats for the data type annotation.
+--skip-empty::
+ Do not display empty (or dummy) events.
+
SEE ALSO
--------
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index cf60392b1c19..efcadb7620b8 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -795,6 +795,8 @@ int cmd_annotate(int argc, const char **argv)
"Show stats for the data type annotation"),
OPT_BOOLEAN(0, "insn-stat", &annotate.insn_stat,
"Show instruction stats for the data type annotation"),
+ OPT_BOOLEAN(0, "skip-empty", &symbol_conf.skip_empty,
+ "Do not display empty (or dummy) events in the output"),
OPT_END()
};
int ret;
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 917897fe44a2..eafe8d65052e 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -848,6 +848,10 @@ static void annotation__calc_percent(struct annotation *notes,
BUG_ON(i >= al->data_nr);
+ if (symbol_conf.skip_empty &&
+ evsel__hists(evsel)->stats.nr_samples == 0)
+ continue;
+
data = &al->data[i++];
calc_percent(notes, evsel, data, al->offset, end);
@@ -901,7 +905,7 @@ int symbol__annotate(struct map_symbol *ms, struct evsel *evsel,
.options = &annotate_opts,
};
struct arch *arch = NULL;
- int err;
+ int err, nr;
err = evsel__get_arch(evsel, &arch);
if (err < 0)
@@ -922,10 +926,18 @@ int symbol__annotate(struct map_symbol *ms, struct evsel *evsel,
return -1;
}
- if (evsel__is_group_event(evsel))
- notes->src->nr_events = evsel->core.nr_members;
- else
- notes->src->nr_events = 1;
+ nr = 0;
+ if (evsel__is_group_event(evsel)) {
+ struct evsel *pos;
+
+ for_each_group_evsel(pos, evsel) {
+ if (symbol_conf.skip_empty &&
+ evsel__hists(pos)->stats.nr_samples == 0)
+ continue;
+ nr++;
+ }
+ }
+ notes->src->nr_events = nr ? nr : 1;
if (annotate_opts.full_addr)
notes->src->start = map__objdump_2mem(ms->map, ms->sym->start);
--
2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
@ 2024-08-05 19:22 ` Arnaldo Carvalho de Melo
2024-08-05 20:14 ` Namhyung Kim
2024-08-05 19:26 ` [PATCH 5/5] perf annotate: Add --skip-empty option Arnaldo Carvalho de Melo
1 sibling, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-05 19:22 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users
On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> Like in perf report, we want to hide empty events in the perf annotate
> output. This is consistent when the option is set in perf report.
>
> For example, the following command would use 3 events including dummy.
>
> $ perf mem record -a -- perf test -w noploop
>
> $ perf evlist
> cpu/mem-loads,ldlat=30/P
> cpu/mem-stores/P
> dummy:u
>
> Just using perf annotate with --group will show the all 3 events.
Seems unrelated, just before compiling with this patch:
root@x1:~# perf mem record -a -- perf test -w noploop
Memory events are enabled on a subset of CPUs: 4-11
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
root@x1:~#
root@x1:~# perf annotate --group --stdio2 sched_clock
root@x1:~# perf annotate --stdio2 sched_clock
Samples: 178 of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent 0xffffffff810511e0 <sched_clock>:
endbr64
5.76 incl pcpu_hot+0x8
5.47 → callq sched_clock_noinstr
88.78 decl pcpu_hot+0x8
↓ je 1e
→ jmp __x86_return_thunk
1e: → callq __SCT__preempt_schedule_notrace
→ jmp __x86_return_thunk
root@x1:~# perf annotate --group --stdio2 sched_clock
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~# perf annotate --group sched_clock
root@x1:~#
root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~#
root@x1:~# perf report --header-only | grep cmdline
# cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop
root@x1:~#
I thought it would be some hybrid oddity but seems to be just --group
related, seems like it stops if the first event has no samples? Because
it works with another symbol:
root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25
Samples: 20 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period]
do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
Percent 0x9900 <do_lookup_x>:
pushq %rbp
movq %rsp,%rbp
pushq %r15
pushq %r14
pushq %r13
pushq %r12
pushq %rbx
subq $0x88,%rsp
movq %rdi,-0x50(%rbp)
movl 8(%r9),%edi
movq 0x10(%rbp),%r12
movq 0x28(%rbp),%r10
movq %rdx,-0x70(%rbp)
movq %rcx,-0x58(%rbp)
movq %rdi,%r11
0.00 5.73 0.00 movq %r8,-0x68(%rbp)
movq (%r9),%r8
movl %esi,%eax
8.30 0.00 0.00 movl 0x30(%rbp),%r9d
movl %esi,%r15d
shrl $6, %eax
movq %r8,%r13
root@x1:~#
Just leaving a note here, no time to fully investigate this now,
- Arnaldo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
2024-08-05 19:22 ` Arnaldo Carvalho de Melo
@ 2024-08-05 19:26 ` Arnaldo Carvalho de Melo
1 sibling, 0 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-05 19:26 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users
On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> Like in perf report, we want to hide empty events in the perf annotate
> output. This is consistent when the option is set in perf report.
>
> For example, the following command would use 3 events including dummy.
The option --skip-empty is useful, but I wonder if for "dummy" it
shouldn't be the default, i.e. a per-event "skip" or "hide" flag that we
would set for the "dummy" event in addition to this --skip-empty command
line option?
- Arnaldo
> $ perf mem record -a -- perf test -w noploop
>
> $ perf evlist
> cpu/mem-loads,ldlat=30/P
> cpu/mem-stores/P
> dummy:u
>
> Just using perf annotate with --group will show the all 3 events.
>
> $ perf annotate --group --stdio | head
> Percent | Source code & Disassembly of ...
> --------------------------------------------------------------
> : 0 0xe060 <_dl_relocate_object>:
> 0.00 0.00 0.00 : e060: pushq %rbp
> 0.00 0.00 0.00 : e061: movq %rsp, %rbp
> 0.00 0.00 0.00 : e064: pushq %r15
> 0.00 0.00 0.00 : e066: movq %rdi, %r15
> 0.00 0.00 0.00 : e069: pushq %r14
> 0.00 0.00 0.00 : e06b: pushq %r13
> 0.00 0.00 0.00 : e06d: movl %edx, %r13d
>
> Now with --skip-empty, it'll hide the last dummy event.
>
> $ perf annotate --group --stdio --skip-empty | head
> Percent | Source code & Disassembly of ...
> ------------------------------------------------------
> : 0 0xe060 <_dl_relocate_object>:
> 0.00 0.00 : e060: pushq %rbp
> 0.00 0.00 : e061: movq %rsp, %rbp
> 0.00 0.00 : e064: pushq %r15
> 0.00 0.00 : e066: movq %rdi, %r15
> 0.00 0.00 : e069: pushq %r14
> 0.00 0.00 : e06b: pushq %r13
> 0.00 0.00 : e06d: movl %edx, %r13d
>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/Documentation/perf-annotate.txt | 3 +++
> tools/perf/builtin-annotate.c | 2 ++
> tools/perf/util/annotate.c | 22 +++++++++++++++++-----
> 3 files changed, 22 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt
> index b95524bea021..156c5f37b051 100644
> --- a/tools/perf/Documentation/perf-annotate.txt
> +++ b/tools/perf/Documentation/perf-annotate.txt
> @@ -165,6 +165,9 @@ include::itrace.txt[]
> --type-stat::
> Show stats for the data type annotation.
>
> +--skip-empty::
> + Do not display empty (or dummy) events.
> +
>
> SEE ALSO
> --------
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index cf60392b1c19..efcadb7620b8 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -795,6 +795,8 @@ int cmd_annotate(int argc, const char **argv)
> "Show stats for the data type annotation"),
> OPT_BOOLEAN(0, "insn-stat", &annotate.insn_stat,
> "Show instruction stats for the data type annotation"),
> + OPT_BOOLEAN(0, "skip-empty", &symbol_conf.skip_empty,
> + "Do not display empty (or dummy) events in the output"),
> OPT_END()
> };
> int ret;
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 917897fe44a2..eafe8d65052e 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -848,6 +848,10 @@ static void annotation__calc_percent(struct annotation *notes,
>
> BUG_ON(i >= al->data_nr);
>
> + if (symbol_conf.skip_empty &&
> + evsel__hists(evsel)->stats.nr_samples == 0)
> + continue;
> +
> data = &al->data[i++];
>
> calc_percent(notes, evsel, data, al->offset, end);
> @@ -901,7 +905,7 @@ int symbol__annotate(struct map_symbol *ms, struct evsel *evsel,
> .options = &annotate_opts,
> };
> struct arch *arch = NULL;
> - int err;
> + int err, nr;
>
> err = evsel__get_arch(evsel, &arch);
> if (err < 0)
> @@ -922,10 +926,18 @@ int symbol__annotate(struct map_symbol *ms, struct evsel *evsel,
> return -1;
> }
>
> - if (evsel__is_group_event(evsel))
> - notes->src->nr_events = evsel->core.nr_members;
> - else
> - notes->src->nr_events = 1;
> + nr = 0;
> + if (evsel__is_group_event(evsel)) {
> + struct evsel *pos;
> +
> + for_each_group_evsel(pos, evsel) {
> + if (symbol_conf.skip_empty &&
> + evsel__hists(pos)->stats.nr_samples == 0)
> + continue;
> + nr++;
> + }
> + }
> + notes->src->nr_events = nr ? nr : 1;
>
> if (annotate_opts.full_addr)
> notes->src->start = map__objdump_2mem(ms->map, ms->sym->start);
> --
> 2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1)
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
` (4 preceding siblings ...)
2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
@ 2024-08-05 19:26 ` Arnaldo Carvalho de Melo
5 siblings, 0 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-05 19:26 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users
On Sat, Aug 03, 2024 at 02:13:27PM -0700, Namhyung Kim wrote:
> Hello,
>
> This is to make perf annotate has the same behavior as perf report.
> Especially in the TUI browser, we want to maintain the same experience
> when it comes to display dummy events from perf report.
>
> $ perf mem record -a -- perf test -w noploop
>
> $ perf evlist
> cpu/mem-loads,ldlat=30/P
> cpu/mem-stores/P
> dummy:u
Thanks, tested and applied to tmp.perf-tools-next will go to
perf-tools-next later.
- Arnaldo
> Just using perf annotate with --group will show the all 3 events.
>
> $ perf annotate --group --stdio | head
> Percent | Source code & Disassembly of ...
> --------------------------------------------------------------
> : 0 0xe060 <_dl_relocate_object>:
> 0.00 0.00 0.00 : e060: pushq %rbp
> 0.00 0.00 0.00 : e061: movq %rsp, %rbp
> 0.00 0.00 0.00 : e064: pushq %r15
> 0.00 0.00 0.00 : e066: movq %rdi, %r15
> 0.00 0.00 0.00 : e069: pushq %r14
> 0.00 0.00 0.00 : e06b: pushq %r13
> 0.00 0.00 0.00 : e06d: movl %edx, %r13d
>
> Now with --skip-empty, it'll hide the last dummy event.
>
> $ perf annotate --group --stdio --skip-empty | head
> Percent | Source code & Disassembly of ...
> ------------------------------------------------------
> : 0 0xe060 <_dl_relocate_object>:
> 0.00 0.00 : e060: pushq %rbp
> 0.00 0.00 : e061: movq %rsp, %rbp
> 0.00 0.00 : e064: pushq %r15
> 0.00 0.00 : e066: movq %rdi, %r15
> 0.00 0.00 : e069: pushq %r14
> 0.00 0.00 : e06b: pushq %r13
> 0.00 0.00 : e06d: movl %edx, %r13d
>
> The code is available in 'perf/annotate-skip-v1' branch at
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
>
> Thanks,
> Namhyung
>
>
> Namhyung Kim (5):
> perf annotate: Use al->data_nr if possible
> perf annotate: Set notes->src->nr_events early
> perf annotate: Use annotation__pcnt_width() consistently
> perf annotate: Set al->data_nr using the notes->src->nr_events
> perf annotate: Add --skip-empty option
>
> tools/perf/Documentation/perf-annotate.txt | 3 ++
> tools/perf/builtin-annotate.c | 2 +
> tools/perf/util/annotate.c | 47 +++++++++++++---------
> tools/perf/util/annotate.h | 2 +-
> tools/perf/util/disasm.c | 6 +--
> 5 files changed, 35 insertions(+), 25 deletions(-)
>
> --
> 2.46.0.rc2.264.g509ed76dc8-goog
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
2024-08-05 19:22 ` Arnaldo Carvalho de Melo
@ 2024-08-05 20:14 ` Namhyung Kim
2024-08-05 20:23 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2024-08-05 20:14 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users
On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > Like in perf report, we want to hide empty events in the perf annotate
> > output. This is consistent when the option is set in perf report.
> >
> > For example, the following command would use 3 events including dummy.
> >
> > $ perf mem record -a -- perf test -w noploop
> >
> > $ perf evlist
> > cpu/mem-loads,ldlat=30/P
> > cpu/mem-stores/P
> > dummy:u
> >
> > Just using perf annotate with --group will show the all 3 events.
>
> Seems unrelated, just before compiling with this patch:
>
> root@x1:~# perf mem record -a -- perf test -w noploop
> Memory events are enabled on a subset of CPUs: 4-11
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> root@x1:~#
>
> root@x1:~# perf annotate --group --stdio2 sched_clock
> root@x1:~# perf annotate --stdio2 sched_clock
> Samples: 178 of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> Percent 0xffffffff810511e0 <sched_clock>:
> endbr64
> 5.76 incl pcpu_hot+0x8
> 5.47 → callq sched_clock_noinstr
> 88.78 decl pcpu_hot+0x8
> ↓ je 1e
> → jmp __x86_return_thunk
> 1e: → callq __SCT__preempt_schedule_notrace
> → jmp __x86_return_thunk
> root@x1:~# perf annotate --group --stdio2 sched_clock
> root@x1:~# perf annotate --group --stdio sched_clock
> root@x1:~# perf annotate --group sched_clock
> root@x1:~#
>
> root@x1:~# perf evlist
> cpu_atom/mem-loads,ldlat=30/P
> cpu_atom/mem-stores/P
> dummy:u
> root@x1:~#
>
> root@x1:~# perf report --header-only | grep cmdline
> # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop
> root@x1:~#
>
> I thought it would be some hybrid oddity but seems to be just --group
> related, seems like it stops if the first event has no samples? Because
> it works with another symbol:
Good catch. Yeah I found it only checked the first event. Something
like below should fix the issue.
Thanks,
Namhyung
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index efcadb7620b8..8d3ec439b783 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -632,11 +632,15 @@ static int __cmd_annotate(struct perf_annotate *ann)
evlist__for_each_entry(session->evlist, pos) {
struct hists *hists = evsel__hists(pos);
u32 nr_samples = hists->stats.nr_samples;
+ struct evsel *evsel;
- if (nr_samples == 0)
+ if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
continue;
- if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
+ for_each_group_member(evsel, pos)
+ nr_samples += evsel__hists(evsel)->stats.nr_samples;
+
+ if (nr_samples == 0)
continue;
hists__find_annotations(hists, pos, ann);
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
2024-08-05 20:14 ` Namhyung Kim
@ 2024-08-05 20:23 ` Arnaldo Carvalho de Melo
2024-08-05 20:50 ` Namhyung Kim
0 siblings, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-05 20:23 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users
On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > Like in perf report, we want to hide empty events in the perf annotate
> > > output. This is consistent when the option is set in perf report.
> > >
> > > For example, the following command would use 3 events including dummy.
> > >
> > > $ perf mem record -a -- perf test -w noploop
> > >
> > > $ perf evlist
> > > cpu/mem-loads,ldlat=30/P
> > > cpu/mem-stores/P
> > > dummy:u
> > >
> > > Just using perf annotate with --group will show the all 3 events.
> >
> > Seems unrelated, just before compiling with this patch:
> >
> > root@x1:~# perf mem record -a -- perf test -w noploop
> > Memory events are enabled on a subset of CPUs: 4-11
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > root@x1:~#
> >
> > root@x1:~# perf annotate --group --stdio2 sched_clock
> > root@x1:~# perf annotate --stdio2 sched_clock
> > Samples: 178 of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > Percent 0xffffffff810511e0 <sched_clock>:
> > endbr64
> > 5.76 incl pcpu_hot+0x8
> > 5.47 → callq sched_clock_noinstr
> > 88.78 decl pcpu_hot+0x8
> > ↓ je 1e
> > → jmp __x86_return_thunk
> > 1e: → callq __SCT__preempt_schedule_notrace
> > → jmp __x86_return_thunk
> > root@x1:~# perf annotate --group --stdio2 sched_clock
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --group sched_clock
> > root@x1:~#
> >
> > root@x1:~# perf evlist
> > cpu_atom/mem-loads,ldlat=30/P
> > cpu_atom/mem-stores/P
> > dummy:u
> > root@x1:~#
> >
> > root@x1:~# perf report --header-only | grep cmdline
> > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop
> > root@x1:~#
> >
> > I thought it would be some hybrid oddity but seems to be just --group
> > related, seems like it stops if the first event has no samples? Because
> > it works with another symbol:
>
> Good catch. Yeah I found it only checked the first event. Something
> like below should fix the issue.
Nope, with the patch applied:
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~# perf annotate --stdio sched_clock
Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff810511e0 <sched_clock>:
0.00 : ffffffff810511e0: endbr64
5.11 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
0.13 : ffffffff810511eb: callq 0xffffffff821350d0
94.76 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
0.00 : ffffffff810511f7: je 0xffffffff810511fe
0.00 : ffffffff810511f9: jmp 0xffffffff82153320
0.00 : ffffffff810511fe: callq 0xffffffff82153990
0.00 : ffffffff81051203: jmp 0xffffffff82153320
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~# perf annotate --group --stdio2 sched_clock
root@x1:~# perf annotate --group sched_clock
root@x1:~#
> Thanks,
> Namhyung
>
>
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index efcadb7620b8..8d3ec439b783 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -632,11 +632,15 @@ static int __cmd_annotate(struct perf_annotate *ann)
> evlist__for_each_entry(session->evlist, pos) {
> struct hists *hists = evsel__hists(pos);
> u32 nr_samples = hists->stats.nr_samples;
> + struct evsel *evsel;
>
> - if (nr_samples == 0)
> + if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
> continue;
>
> - if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
> + for_each_group_member(evsel, pos)
> + nr_samples += evsel__hists(evsel)->stats.nr_samples;
> +
> + if (nr_samples == 0)
> continue;
>
> hists__find_annotations(hists, pos, ann);
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
2024-08-05 20:23 ` Arnaldo Carvalho de Melo
@ 2024-08-05 20:50 ` Namhyung Kim
2024-08-06 13:12 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2024-08-05 20:50 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users
On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > output. This is consistent when the option is set in perf report.
> > > >
> > > > For example, the following command would use 3 events including dummy.
> > > >
> > > > $ perf mem record -a -- perf test -w noploop
> > > >
> > > > $ perf evlist
> > > > cpu/mem-loads,ldlat=30/P
> > > > cpu/mem-stores/P
> > > > dummy:u
> > > >
> > > > Just using perf annotate with --group will show the all 3 events.
> > >
> > > Seems unrelated, just before compiling with this patch:
> > >
> > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > Memory events are enabled on a subset of CPUs: 4-11
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > root@x1:~#
> > >
> > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > root@x1:~# perf annotate --stdio2 sched_clock
> > > Samples: 178 of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > Percent 0xffffffff810511e0 <sched_clock>:
> > > endbr64
> > > 5.76 incl pcpu_hot+0x8
> > > 5.47 → callq sched_clock_noinstr
> > > 88.78 decl pcpu_hot+0x8
> > > ↓ je 1e
> > > → jmp __x86_return_thunk
> > > 1e: → callq __SCT__preempt_schedule_notrace
> > > → jmp __x86_return_thunk
> > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > root@x1:~# perf annotate --group --stdio sched_clock
> > > root@x1:~# perf annotate --group sched_clock
> > > root@x1:~#
> > >
> > > root@x1:~# perf evlist
> > > cpu_atom/mem-loads,ldlat=30/P
> > > cpu_atom/mem-stores/P
> > > dummy:u
> > > root@x1:~#
> > >
> > > root@x1:~# perf report --header-only | grep cmdline
> > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop
> > > root@x1:~#
> > >
> > > I thought it would be some hybrid oddity but seems to be just --group
> > > related, seems like it stops if the first event has no samples? Because
> > > it works with another symbol:
> >
> > Good catch. Yeah I found it only checked the first event. Something
> > like below should fix the issue.
>
> Nope, with the patch applied:
>
> root@x1:~# perf annotate --group --stdio sched_clock
> root@x1:~# perf annotate --stdio sched_clock
> Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> -------------------------------------------------------------------------------------------------------------------
> : 0 0xffffffff810511e0 <sched_clock>:
> 0.00 : ffffffff810511e0: endbr64
> 5.11 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
> 0.13 : ffffffff810511eb: callq 0xffffffff821350d0
> 94.76 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
> 0.00 : ffffffff810511f7: je 0xffffffff810511fe
> 0.00 : ffffffff810511f9: jmp 0xffffffff82153320
> 0.00 : ffffffff810511fe: callq 0xffffffff82153990
> 0.00 : ffffffff81051203: jmp 0xffffffff82153320
> root@x1:~# perf annotate --group --stdio sched_clock
> root@x1:~# perf annotate --group --stdio2 sched_clock
> root@x1:~# perf annotate --group sched_clock
> root@x1:~#
Oh ok, it was not enough. It should call evsel__output_resort() after
hists__match() and hists__link(). Use this instead.
Thanks,
Namhyung
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index efcadb7620b8..1bfe41783a7c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -632,13 +632,23 @@ static int __cmd_annotate(struct perf_annotate *ann)
evlist__for_each_entry(session->evlist, pos) {
struct hists *hists = evsel__hists(pos);
u32 nr_samples = hists->stats.nr_samples;
+ struct ui_progress prog;
+ struct evsel *evsel;
- if (nr_samples == 0)
+ if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
continue;
- if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
+ for_each_group_member(evsel, pos)
+ nr_samples += evsel__hists(evsel)->stats.nr_samples;
+
+ if (nr_samples == 0)
continue;
+ ui_progress__init(&prog, nr_samples,
+ "Sorting group events for output...");
+ evsel__output_resort(pos, &prog);
+ ui_progress__finish();
+
hists__find_annotations(hists, pos, ann);
}
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
2024-08-05 20:50 ` Namhyung Kim
@ 2024-08-06 13:12 ` Arnaldo Carvalho de Melo
2024-08-07 6:12 ` Namhyung Kim
0 siblings, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-06 13:12 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users
On Mon, Aug 05, 2024 at 01:50:25PM -0700, Namhyung Kim wrote:
> On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > > output. This is consistent when the option is set in perf report.
> > > > >
> > > > > For example, the following command would use 3 events including dummy.
> > > > >
> > > > > $ perf mem record -a -- perf test -w noploop
> > > > >
> > > > > $ perf evlist
> > > > > cpu/mem-loads,ldlat=30/P
> > > > > cpu/mem-stores/P
> > > > > dummy:u
> > > > >
> > > > > Just using perf annotate with --group will show the all 3 events.
> > > >
> > > > Seems unrelated, just before compiling with this patch:
> > > >
> > > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > > Memory events are enabled on a subset of CPUs: 4-11
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > > root@x1:~#
> > > >
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --stdio2 sched_clock
> > > > Samples: 178 of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > > Percent 0xffffffff810511e0 <sched_clock>:
> > > > endbr64
> > > > 5.76 incl pcpu_hot+0x8
> > > > 5.47 → callq sched_clock_noinstr
> > > > 88.78 decl pcpu_hot+0x8
> > > > ↓ je 1e
> > > > → jmp __x86_return_thunk
> > > > 1e: → callq __SCT__preempt_schedule_notrace
> > > > → jmp __x86_return_thunk
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --group --stdio sched_clock
> > > > root@x1:~# perf annotate --group sched_clock
> > > > root@x1:~#
> > > >
> > > > root@x1:~# perf evlist
> > > > cpu_atom/mem-loads,ldlat=30/P
> > > > cpu_atom/mem-stores/P
> > > > dummy:u
> > > > root@x1:~#
> > > >
> > > > root@x1:~# perf report --header-only | grep cmdline
> > > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop
> > > > root@x1:~#
> > > >
> > > > I thought it would be some hybrid oddity but seems to be just --group
> > > > related, seems like it stops if the first event has no samples? Because
> > > > it works with another symbol:
> > >
> > > Good catch. Yeah I found it only checked the first event. Something
> > > like below should fix the issue.
> >
> > Nope, with the patch applied:
> >
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --stdio sched_clock
> > Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> > -------------------------------------------------------------------------------------------------------------------
> > : 0 0xffffffff810511e0 <sched_clock>:
> > 0.00 : ffffffff810511e0: endbr64
> > 5.11 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
> > 0.13 : ffffffff810511eb: callq 0xffffffff821350d0
> > 94.76 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
> > 0.00 : ffffffff810511f7: je 0xffffffff810511fe
> > 0.00 : ffffffff810511f9: jmp 0xffffffff82153320
> > 0.00 : ffffffff810511fe: callq 0xffffffff82153990
> > 0.00 : ffffffff81051203: jmp 0xffffffff82153320
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --group --stdio2 sched_clock
> > root@x1:~# perf annotate --group sched_clock
> > root@x1:~#
>
> Oh ok, it was not enough. It should call evsel__output_resort() after
> hists__match() and hists__link(). Use this instead.
Ok, this works:
Before this patch:
root@x1:~# perf annotate --stdio sched_clock
Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff810511e0 <sched_clock>:
0.00 : ffffffff810511e0: endbr64
5.11 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
0.13 : ffffffff810511eb: callq 0xffffffff821350d0
94.76 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
0.00 : ffffffff810511f7: je 0xffffffff810511fe
0.00 : ffffffff810511f9: jmp 0xffffffff82153320
0.00 : ffffffff810511fe: callq 0xffffffff82153990
0.00 : ffffffff81051203: jmp 0xffffffff82153320
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~#
After:
root@x1:~# perf annotate --group --stdio sched_clock
Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u (0 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff810511e0 <sched_clock>:
0.00 0.00 0.00 : ffffffff810511e0: endbr64
0.00 5.11 0.00 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
0.00 0.13 0.00 : ffffffff810511eb: callq 0xffffffff821350d0
0.00 94.76 0.00 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
0.00 0.00 0.00 : ffffffff810511f7: je 0xffffffff810511fe
0.00 0.00 0.00 : ffffffff810511f9: jmp 0xffffffff82153320
0.00 0.00 0.00 : ffffffff810511fe: callq 0xffffffff82153990
0.00 0.00 0.00 : ffffffff81051203: jmp 0xffffffff82153320
root@x1:~#
One example with samples for the first two events:
root@x1:~# perf annotate --group --stdio2
Samples: 2K of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 22892183, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent 0xffffffff8124e080 <cgroup_rstat_updated>:
0.00 0.24 0.00 endbr64
→ callq __fentry__
0.00 99.76 0.00 pushq %r15
movq $0x251d4,%rcx
pushq %r14
movq %rdi,%r14
pushq %r13
movslq %esi,%r13
pushq %r12
pushq %rbp
pushq %rbx
subq $0x10,%rsp
cmpq $0x2000,%r13
↓ jae 17f
31: movq 0x3d0(%r14),%rbx
movq -0x7d3fb360(, %r13, 8),%r12
cmpq $0x2000,%r13
↓ jae 19b
25.00 0.00 0.00 4d: cmpq $0,0x88(%r12, %rbx)
↓ je 6b
addq $0x10,%rsp
popq %rbx
popq %rbp
popq %r12
75.00 0.00 0.00 popq %r13
popq %r14
popq %r15
→ jmp __x86_return_thunk
<SNIP>
And then skipping "empty" events:
root@x1:~# perf annotate --group --skip-empty --stdio2 cgroup_rstat_updated | head -35
Samples: 4 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 31851, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent 0xffffffff8124e080 <cgroup_rstat_updated>:
0.00 0.24 endbr64
→ callq __fentry__
0.00 99.76 pushq %r15
movq $0x251d4,%rcx
pushq %r14
movq %rdi,%r14
pushq %r13
movslq %esi,%r13
pushq %r12
pushq %rbp
pushq %rbx
subq $0x10,%rsp
cmpq $0x2000,%r13
↓ jae 17f
31: movq 0x3d0(%r14),%rbx
movq -0x7d3fb360(, %r13, 8),%r12
cmpq $0x2000,%r13
↓ jae 19b
25.00 0.00 4d: cmpq $0,0x88(%r12, %rbx)
↓ je 6b
addq $0x10,%rsp
popq %rbx
popq %rbp
popq %r12
75.00 0.00 popq %r13
popq %r14
popq %r15
→ jmp __x86_return_thunk
6b: addq %r12,%rcx
movq %rcx,%rdi
movq %rcx,(%rsp)
→ callq *ffffffff82151500
root@x1:~#
So, I haven't done further analysis but I think this is a separate
issue, right?
Thanks for the fix!
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- Arnaldo
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
2024-08-06 13:12 ` Arnaldo Carvalho de Melo
@ 2024-08-07 6:12 ` Namhyung Kim
2024-08-07 6:15 ` [PATCH] perf annotate: Fix --group behavior when leader has no samples Namhyung Kim
0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2024-08-07 6:12 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users
On Tue, Aug 6, 2024 at 6:12 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> On Mon, Aug 05, 2024 at 01:50:25PM -0700, Namhyung Kim wrote:
> > On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > > > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > > > output. This is consistent when the option is set in perf report.
> > > > > >
> > > > > > For example, the following command would use 3 events including dummy.
> > > > > >
> > > > > > $ perf mem record -a -- perf test -w noploop
> > > > > >
> > > > > > $ perf evlist
> > > > > > cpu/mem-loads,ldlat=30/P
> > > > > > cpu/mem-stores/P
> > > > > > dummy:u
> > > > > >
> > > > > > Just using perf annotate with --group will show the all 3 events.
> > > > >
> > > > > Seems unrelated, just before compiling with this patch:
> > > > >
> > > > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > > > Memory events are enabled on a subset of CPUs: 4-11
> > > > > [ perf record: Woken up 1 times to write data ]
> > > > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > > > root@x1:~#
> > > > >
> > > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > > root@x1:~# perf annotate --stdio2 sched_clock
> > > > > Samples: 178 of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > > > Percent 0xffffffff810511e0 <sched_clock>:
> > > > > endbr64
> > > > > 5.76 incl pcpu_hot+0x8
> > > > > 5.47 → callq sched_clock_noinstr
> > > > > 88.78 decl pcpu_hot+0x8
> > > > > ↓ je 1e
> > > > > → jmp __x86_return_thunk
> > > > > 1e: → callq __SCT__preempt_schedule_notrace
> > > > > → jmp __x86_return_thunk
> > > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > > root@x1:~# perf annotate --group --stdio sched_clock
> > > > > root@x1:~# perf annotate --group sched_clock
> > > > > root@x1:~#
> > > > >
> > > > > root@x1:~# perf evlist
> > > > > cpu_atom/mem-loads,ldlat=30/P
> > > > > cpu_atom/mem-stores/P
> > > > > dummy:u
> > > > > root@x1:~#
> > > > >
> > > > > root@x1:~# perf report --header-only | grep cmdline
> > > > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop
> > > > > root@x1:~#
> > > > >
> > > > > I thought it would be some hybrid oddity but seems to be just --group
> > > > > related, seems like it stops if the first event has no samples? Because
> > > > > it works with another symbol:
> > > >
> > > > Good catch. Yeah I found it only checked the first event. Something
> > > > like below should fix the issue.
> > >
> > > Nope, with the patch applied:
> > >
> > > root@x1:~# perf annotate --group --stdio sched_clock
> > > root@x1:~# perf annotate --stdio sched_clock
> > > Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> > > -------------------------------------------------------------------------------------------------------------------
> > > : 0 0xffffffff810511e0 <sched_clock>:
> > > 0.00 : ffffffff810511e0: endbr64
> > > 5.11 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
> > > 0.13 : ffffffff810511eb: callq 0xffffffff821350d0
> > > 94.76 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
> > > 0.00 : ffffffff810511f7: je 0xffffffff810511fe
> > > 0.00 : ffffffff810511f9: jmp 0xffffffff82153320
> > > 0.00 : ffffffff810511fe: callq 0xffffffff82153990
> > > 0.00 : ffffffff81051203: jmp 0xffffffff82153320
> > > root@x1:~# perf annotate --group --stdio sched_clock
> > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > root@x1:~# perf annotate --group sched_clock
> > > root@x1:~#
> >
> > Oh ok, it was not enough. It should call evsel__output_resort() after
> > hists__match() and hists__link(). Use this instead.
>
> Ok, this works:
>
> Before this patch:
>
> root@x1:~# perf annotate --stdio sched_clock
> Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> -------------------------------------------------------------------------------------------------------------------
> : 0 0xffffffff810511e0 <sched_clock>:
> 0.00 : ffffffff810511e0: endbr64
> 5.11 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
> 0.13 : ffffffff810511eb: callq 0xffffffff821350d0
> 94.76 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
> 0.00 : ffffffff810511f7: je 0xffffffff810511fe
> 0.00 : ffffffff810511f9: jmp 0xffffffff82153320
> 0.00 : ffffffff810511fe: callq 0xffffffff82153990
> 0.00 : ffffffff81051203: jmp 0xffffffff82153320
> root@x1:~# perf annotate --group --stdio sched_clock
> root@x1:~#
>
> After:
>
> root@x1:~# perf annotate --group --stdio sched_clock
> Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u (0 samples, percent: local period)
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> : 0 0xffffffff810511e0 <sched_clock>:
> 0.00 0.00 0.00 : ffffffff810511e0: endbr64
> 0.00 5.11 0.00 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
> 0.00 0.13 0.00 : ffffffff810511eb: callq 0xffffffff821350d0
> 0.00 94.76 0.00 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
> 0.00 0.00 0.00 : ffffffff810511f7: je 0xffffffff810511fe
> 0.00 0.00 0.00 : ffffffff810511f9: jmp 0xffffffff82153320
> 0.00 0.00 0.00 : ffffffff810511fe: callq 0xffffffff82153990
> 0.00 0.00 0.00 : ffffffff81051203: jmp 0xffffffff82153320
> root@x1:~#
>
> One example with samples for the first two events:
>
> root@x1:~# perf annotate --group --stdio2
> Samples: 2K of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 22892183, [percent: local period]
> cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> Percent 0xffffffff8124e080 <cgroup_rstat_updated>:
> 0.00 0.24 0.00 endbr64
> → callq __fentry__
> 0.00 99.76 0.00 pushq %r15
> movq $0x251d4,%rcx
> pushq %r14
> movq %rdi,%r14
> pushq %r13
> movslq %esi,%r13
> pushq %r12
> pushq %rbp
> pushq %rbx
> subq $0x10,%rsp
> cmpq $0x2000,%r13
> ↓ jae 17f
> 31: movq 0x3d0(%r14),%rbx
> movq -0x7d3fb360(, %r13, 8),%r12
> cmpq $0x2000,%r13
> ↓ jae 19b
> 25.00 0.00 0.00 4d: cmpq $0,0x88(%r12, %rbx)
> ↓ je 6b
> addq $0x10,%rsp
> popq %rbx
> popq %rbp
> popq %r12
> 75.00 0.00 0.00 popq %r13
> popq %r14
> popq %r15
> → jmp __x86_return_thunk
> <SNIP>
>
> And then skipping "empty" events:
>
> root@x1:~# perf annotate --group --skip-empty --stdio2 cgroup_rstat_updated | head -35
> Samples: 4 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 31851, [percent: local period]
> cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> Percent 0xffffffff8124e080 <cgroup_rstat_updated>:
> 0.00 0.24 endbr64
> → callq __fentry__
> 0.00 99.76 pushq %r15
> movq $0x251d4,%rcx
> pushq %r14
> movq %rdi,%r14
> pushq %r13
> movslq %esi,%r13
> pushq %r12
> pushq %rbp
> pushq %rbx
> subq $0x10,%rsp
> cmpq $0x2000,%r13
> ↓ jae 17f
> 31: movq 0x3d0(%r14),%rbx
> movq -0x7d3fb360(, %r13, 8),%r12
> cmpq $0x2000,%r13
> ↓ jae 19b
> 25.00 0.00 4d: cmpq $0,0x88(%r12, %rbx)
> ↓ je 6b
> addq $0x10,%rsp
> popq %rbx
> popq %rbp
> popq %r12
> 75.00 0.00 popq %r13
> popq %r14
> popq %r15
> → jmp __x86_return_thunk
> 6b: addq %r12,%rcx
> movq %rcx,%rdi
> movq %rcx,(%rsp)
> → callq *ffffffff82151500
> root@x1:~#
>
> So, I haven't done further analysis but I think this is a separate
> issue, right?
Yep, it's not related to --skip-empty.
>
> Thanks for the fix!
>
> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Will send a fix with your tags.
Thanks,
Namhyung
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH] perf annotate: Fix --group behavior when leader has no samples
2024-08-07 6:12 ` Namhyung Kim
@ 2024-08-07 6:15 ` Namhyung Kim
2024-08-09 21:15 ` Arnaldo Carvalho de Melo
0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2024-08-07 6:15 UTC (permalink / raw)
To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
linux-perf-users, Arnaldo Carvalho de Melo
When --group option is used, it should display all events together. But
the current logic only checks if the first (leader) event has samples or
not. Let's check the member events as well.
Also it missed to put the linked samples from member evsels to the
output RB-tree so that it can be displayed in the output.
For example, take a look at this example.
$ ./perf evlist
cpu/mem-loads,ldlat=30/P
cpu/mem-stores/P
dummy:u
It has three events but 'path_put' function has samples only for
mem-stores (second) event.
$ sudo ./perf annotate --stdio -f path_put
Percent | Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period)
----------------------------------------------------------------------------------------------------------
: 0 0xffffffffae600020 <path_put>:
0.00 : ffffffffae600020: endbr64
0.00 : ffffffffae600024: nopl (%rax, %rax)
91.22 : ffffffffae600029: pushq %rbx
0.00 : ffffffffae60002a: movq %rdi, %rbx
0.00 : ffffffffae60002d: movq 8(%rdi), %rdi
8.78 : ffffffffae600031: callq 0xffffffffae614aa0
0.00 : ffffffffae600036: movq (%rbx), %rdi
0.00 : ffffffffae600039: popq %rbx
0.00 : ffffffffae60003a: jmp 0xffffffffae620670
0.00 : ffffffffae60003f: nop
Therefore, it didn't show up when --group option is used since the
leader ("mem-loads") event has no samples. But now it checks both
events.
Before:
$ sudo ./perf annotate --stdio -f --group path_put
(no output)
After:
$ sudo ./perf annotate --stdio -f --group path_put
Percent | Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------------------------------------------------
: 0 0xffffffffae600020 <path_put>:
0.00 0.00 0.00 : ffffffffae600020: endbr64
0.00 0.00 0.00 : ffffffffae600024: nopl (%rax, %rax)
0.00 91.22 0.00 : ffffffffae600029: pushq %rbx
0.00 0.00 0.00 : ffffffffae60002a: movq %rdi, %rbx
0.00 0.00 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi
0.00 8.78 0.00 : ffffffffae600031: callq 0xffffffffae614aa0
0.00 0.00 0.00 : ffffffffae600036: movq (%rbx), %rdi
0.00 0.00 0.00 : ffffffffae600039: popq %rbx
0.00 0.00 0.00 : ffffffffae60003a: jmp 0xffffffffae620670
0.00 0.00 0.00 : ffffffffae60003f: nop
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
tools/perf/builtin-annotate.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index efcadb7620b8..1bfe41783a7c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -632,13 +632,23 @@ static int __cmd_annotate(struct perf_annotate *ann)
evlist__for_each_entry(session->evlist, pos) {
struct hists *hists = evsel__hists(pos);
u32 nr_samples = hists->stats.nr_samples;
+ struct ui_progress prog;
+ struct evsel *evsel;
- if (nr_samples == 0)
+ if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
continue;
- if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
+ for_each_group_member(evsel, pos)
+ nr_samples += evsel__hists(evsel)->stats.nr_samples;
+
+ if (nr_samples == 0)
continue;
+ ui_progress__init(&prog, nr_samples,
+ "Sorting group events for output...");
+ evsel__output_resort(pos, &prog);
+ ui_progress__finish();
+
hists__find_annotations(hists, pos, ann);
}
--
2.46.0.rc2.264.g509ed76dc8-goog
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] perf annotate: Fix --group behavior when leader has no samples
2024-08-07 6:15 ` [PATCH] perf annotate: Fix --group behavior when leader has no samples Namhyung Kim
@ 2024-08-09 21:15 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-09 21:15 UTC (permalink / raw)
To: Namhyung Kim
Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
Ingo Molnar, LKML, linux-perf-users, Arnaldo Carvalho de Melo
On Tue, Aug 06, 2024 at 11:15:55PM -0700, Namhyung Kim wrote:
> When --group option is used, it should display all events together. But
> the current logic only checks if the first (leader) event has samples or
> not. Let's check the member events as well.
>
> Also it missed to put the linked samples from member evsels to the
> output RB-tree so that it can be displayed in the output.
Thanks, re-tested and applied.
- Arnaldo
> For example, take a look at this example.
>
> $ ./perf evlist
> cpu/mem-loads,ldlat=30/P
> cpu/mem-stores/P
> dummy:u
>
> It has three events but 'path_put' function has samples only for
> mem-stores (second) event.
>
> $ sudo ./perf annotate --stdio -f path_put
> Percent | Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period)
> ----------------------------------------------------------------------------------------------------------
> : 0 0xffffffffae600020 <path_put>:
> 0.00 : ffffffffae600020: endbr64
> 0.00 : ffffffffae600024: nopl (%rax, %rax)
> 91.22 : ffffffffae600029: pushq %rbx
> 0.00 : ffffffffae60002a: movq %rdi, %rbx
> 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi
> 8.78 : ffffffffae600031: callq 0xffffffffae614aa0
> 0.00 : ffffffffae600036: movq (%rbx), %rdi
> 0.00 : ffffffffae600039: popq %rbx
> 0.00 : ffffffffae60003a: jmp 0xffffffffae620670
> 0.00 : ffffffffae60003f: nop
>
> Therefore, it didn't show up when --group option is used since the
> leader ("mem-loads") event has no samples. But now it checks both
> events.
>
> Before:
> $ sudo ./perf annotate --stdio -f --group path_put
> (no output)
>
> After:
> $ sudo ./perf annotate --stdio -f --group path_put
> Percent | Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period)
> -------------------------------------------------------------------------------------------------------------------------------------------------------------
> : 0 0xffffffffae600020 <path_put>:
> 0.00 0.00 0.00 : ffffffffae600020: endbr64
> 0.00 0.00 0.00 : ffffffffae600024: nopl (%rax, %rax)
> 0.00 91.22 0.00 : ffffffffae600029: pushq %rbx
> 0.00 0.00 0.00 : ffffffffae60002a: movq %rdi, %rbx
> 0.00 0.00 0.00 : ffffffffae60002d: movq 8(%rdi), %rdi
> 0.00 8.78 0.00 : ffffffffae600031: callq 0xffffffffae614aa0
> 0.00 0.00 0.00 : ffffffffae600036: movq (%rbx), %rdi
> 0.00 0.00 0.00 : ffffffffae600039: popq %rbx
> 0.00 0.00 0.00 : ffffffffae60003a: jmp 0xffffffffae620670
> 0.00 0.00 0.00 : ffffffffae60003f: nop
>
> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
> tools/perf/builtin-annotate.c | 14 ++++++++++++--
> 1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index efcadb7620b8..1bfe41783a7c 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -632,13 +632,23 @@ static int __cmd_annotate(struct perf_annotate *ann)
> evlist__for_each_entry(session->evlist, pos) {
> struct hists *hists = evsel__hists(pos);
> u32 nr_samples = hists->stats.nr_samples;
> + struct ui_progress prog;
> + struct evsel *evsel;
>
> - if (nr_samples == 0)
> + if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
> continue;
>
> - if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
> + for_each_group_member(evsel, pos)
> + nr_samples += evsel__hists(evsel)->stats.nr_samples;
> +
> + if (nr_samples == 0)
> continue;
>
> + ui_progress__init(&prog, nr_samples,
> + "Sorting group events for output...");
> + evsel__output_resort(pos, &prog);
> + ui_progress__finish();
> +
> hists__find_annotations(hists, pos, ann);
> }
>
> --
> 2.46.0.rc2.264.g509ed76dc8-goog
>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-08-09 21:15 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
2024-08-03 21:13 ` [PATCH 2/5] perf annotate: Set notes->src->nr_events early Namhyung Kim
2024-08-03 21:13 ` [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently Namhyung Kim
2024-08-03 21:13 ` [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events Namhyung Kim
2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
2024-08-05 19:22 ` Arnaldo Carvalho de Melo
2024-08-05 20:14 ` Namhyung Kim
2024-08-05 20:23 ` Arnaldo Carvalho de Melo
2024-08-05 20:50 ` Namhyung Kim
2024-08-06 13:12 ` Arnaldo Carvalho de Melo
2024-08-07 6:12 ` Namhyung Kim
2024-08-07 6:15 ` [PATCH] perf annotate: Fix --group behavior when leader has no samples Namhyung Kim
2024-08-09 21:15 ` Arnaldo Carvalho de Melo
2024-08-05 19:26 ` [PATCH 5/5] perf annotate: Add --skip-empty option Arnaldo Carvalho de Melo
2024-08-05 19:26 ` [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Arnaldo Carvalho de Melo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).