linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1)
@ 2024-08-03 21:13 Namhyung Kim
  2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users

Hello,

This is to make perf annotate has the same behavior as perf report.
Especially in the TUI browser, we want to maintain the same experience
when it comes to display dummy events from perf report.

  $ perf mem record -a -- perf test -w noploop

  $ perf evlist
  cpu/mem-loads,ldlat=30/P
  cpu/mem-stores/P
  dummy:u

Just using perf annotate with --group will show the all 3 events.

  $ perf annotate --group --stdio | head
   Percent                 |    Source code & Disassembly of ...
  --------------------------------------------------------------
                           : 0     0xe060 <_dl_relocate_object>:
      0.00    0.00    0.00 :    e060:       pushq   %rbp
      0.00    0.00    0.00 :    e061:       movq    %rsp, %rbp
      0.00    0.00    0.00 :    e064:       pushq   %r15
      0.00    0.00    0.00 :    e066:       movq    %rdi, %r15
      0.00    0.00    0.00 :    e069:       pushq   %r14
      0.00    0.00    0.00 :    e06b:       pushq   %r13
      0.00    0.00    0.00 :    e06d:       movl    %edx, %r13d

Now with --skip-empty, it'll hide the last dummy event.

  $ perf annotate --group --stdio --skip-empty | head
   Percent         |    Source code & Disassembly of ...
  ------------------------------------------------------
                   : 0     0xe060 <_dl_relocate_object>:
      0.00    0.00 :    e060:       pushq   %rbp
      0.00    0.00 :    e061:       movq    %rsp, %rbp
      0.00    0.00 :    e064:       pushq   %r15
      0.00    0.00 :    e066:       movq    %rdi, %r15
      0.00    0.00 :    e069:       pushq   %r14
      0.00    0.00 :    e06b:       pushq   %r13
      0.00    0.00 :    e06d:       movl    %edx, %r13d

The code is available in 'perf/annotate-skip-v1' branch at
git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git

Thanks,
Namhyung


Namhyung Kim (5):
  perf annotate: Use al->data_nr if possible
  perf annotate: Set notes->src->nr_events early
  perf annotate: Use annotation__pcnt_width() consistently
  perf annotate: Set al->data_nr using the notes->src->nr_events
  perf annotate: Add --skip-empty option

 tools/perf/Documentation/perf-annotate.txt |  3 ++
 tools/perf/builtin-annotate.c              |  2 +
 tools/perf/util/annotate.c                 | 47 +++++++++++++---------
 tools/perf/util/annotate.h                 |  2 +-
 tools/perf/util/disasm.c                   |  6 +--
 5 files changed, 35 insertions(+), 25 deletions(-)

-- 
2.46.0.rc2.264.g509ed76dc8-goog


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/5] perf annotate: Use al->data_nr if possible
  2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
@ 2024-08-03 21:13 ` Namhyung Kim
  2024-08-03 21:13 ` [PATCH 2/5] perf annotate: Set notes->src->nr_events early Namhyung Kim
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users

The data_nr keeps the number of entries in al->data[] so it should use
it when it iterates the array.  The notes->src->nr_events should have
the same number but it'd be natural to use al->data_nr.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/annotate.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index a2ee4074f768..91ad948c89d5 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1594,13 +1594,12 @@ bool ui__has_annotation(void)
 
 
 static double annotation_line__max_percent(struct annotation_line *al,
-					   struct annotation *notes,
 					   unsigned int percent_type)
 {
 	double percent_max = 0.0;
 	int i;
 
-	for (i = 0; i < notes->src->nr_events; i++) {
+	for (i = 0; i < al->data_nr; i++) {
 		double percent;
 
 		percent = annotation_data__percent(&al->data[i],
@@ -1672,7 +1671,7 @@ static void __annotation_line__write(struct annotation_line *al, struct annotati
 				     void (*obj__write_graph)(void *obj, int graph))
 
 {
-	double percent_max = annotation_line__max_percent(al, notes, percent_type);
+	double percent_max = annotation_line__max_percent(al, percent_type);
 	int pcnt_width = annotation__pcnt_width(notes),
 	    cycles_width = annotation__cycles_width(notes);
 	bool show_title = false;
@@ -1690,7 +1689,7 @@ static void __annotation_line__write(struct annotation_line *al, struct annotati
 	if (al->offset != -1 && percent_max != 0.0) {
 		int i;
 
-		for (i = 0; i < notes->src->nr_events; i++) {
+		for (i = 0; i < al->data_nr; i++) {
 			double percent;
 
 			percent = annotation_data__percent(&al->data[i], percent_type);
-- 
2.46.0.rc2.264.g509ed76dc8-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 2/5] perf annotate: Set notes->src->nr_events early
  2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
  2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
@ 2024-08-03 21:13 ` Namhyung Kim
  2024-08-03 21:13 ` [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently Namhyung Kim
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users

We want to use it in different places so make sure it sets properly
in symbol__annotate() before creating the disasm lines.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/annotate.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 91ad948c89d5..09e6fdf344db 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -925,6 +925,11 @@ int symbol__annotate(struct map_symbol *ms, struct evsel *evsel,
 			return -1;
 	}
 
+	if (evsel__is_group_event(evsel))
+		notes->src->nr_events = evsel->core.nr_members;
+	else
+		notes->src->nr_events = 1;
+
 	if (annotate_opts.full_addr)
 		notes->src->start = map__objdump_2mem(ms->map, ms->sym->start);
 	else
@@ -1842,10 +1847,7 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
 	struct symbol *sym = ms->sym;
 	struct annotation *notes = symbol__annotation(sym);
 	size_t size = symbol__size(sym);
-	int nr_pcnt = 1, err;
-
-	if (evsel__is_group_event(evsel))
-		nr_pcnt = evsel->core.nr_members;
+	int err;
 
 	err = symbol__annotate(ms, evsel, parch);
 	if (err)
@@ -1861,8 +1863,6 @@ int symbol__annotate2(struct map_symbol *ms, struct evsel *evsel,
 		return err;
 
 	annotation__init_column_widths(notes, sym);
-	notes->src->nr_events = nr_pcnt;
-
 	annotation__update_column_widths(notes);
 	sym->annotate2 = 1;
 
-- 
2.46.0.rc2.264.g509ed76dc8-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently
  2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
  2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
  2024-08-03 21:13 ` [PATCH 2/5] perf annotate: Set notes->src->nr_events early Namhyung Kim
@ 2024-08-03 21:13 ` Namhyung Kim
  2024-08-03 21:13 ` [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events Namhyung Kim
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users

The annotation__pcnt_width() calculates the screen width for the
overhead (percent) area considering event groups properly.  Use this
function consistently so that we can make sure it has similar output
in different modes.  But there's a difference in stdio and tui output:
stdio uses 8 and tui uses 7 for a percent.

Let's use 8 and adjust the print width in __annotation_line__write()
properly.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/annotate.c | 14 +++++---------
 tools/perf/util/annotate.h |  2 +-
 2 files changed, 6 insertions(+), 10 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 09e6fdf344db..917897fe44a2 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -699,13 +699,13 @@ annotation_line__print(struct annotation_line *al, struct symbol *sym, u64 start
 		       int percent_type)
 {
 	struct disasm_line *dl = container_of(al, struct disasm_line, al);
+	struct annotation *notes = symbol__annotation(sym);
 	static const char *prev_line;
 
 	if (al->offset != -1) {
 		double max_percent = 0.0;
 		int i, nr_percent = 1;
 		const char *color;
-		struct annotation *notes = symbol__annotation(sym);
 
 		for (i = 0; i < al->data_nr; i++) {
 			double percent;
@@ -775,14 +775,11 @@ annotation_line__print(struct annotation_line *al, struct symbol *sym, u64 start
 	} else if (max_lines && printed >= max_lines)
 		return 1;
 	else {
-		int width = symbol_conf.show_total_period ? 12 : 8;
+		int width = annotation__pcnt_width(notes);
 
 		if (queue)
 			return -1;
 
-		if (evsel__is_group_event(evsel))
-			width *= evsel->core.nr_members;
-
 		if (!*al->line)
 			printf(" %*s:\n", width, " ");
 		else
@@ -1111,7 +1108,7 @@ int symbol__annotate_printf(struct map_symbol *ms, struct evsel *evsel)
 	int more = 0;
 	bool context = opts->context;
 	u64 len;
-	int width = symbol_conf.show_total_period ? 12 : 8;
+	int width = annotation__pcnt_width(notes);
 	int graph_dotted_len;
 	char buf[512];
 
@@ -1127,7 +1124,6 @@ int symbol__annotate_printf(struct map_symbol *ms, struct evsel *evsel)
 	len = symbol__size(sym);
 
 	if (evsel__is_group_event(evsel)) {
-		width *= evsel->core.nr_members;
 		evsel__group_desc(evsel, buf, sizeof(buf));
 		evsel_name = buf;
 	}
@@ -1703,10 +1699,10 @@ static void __annotation_line__write(struct annotation_line *al, struct annotati
 			if (symbol_conf.show_total_period) {
 				obj__printf(obj, "%11" PRIu64 " ", al->data[i].he.period);
 			} else if (symbol_conf.show_nr_samples) {
-				obj__printf(obj, "%6" PRIu64 " ",
+				obj__printf(obj, "%7" PRIu64 " ",
 						   al->data[i].he.nr_samples);
 			} else {
-				obj__printf(obj, "%6.2f ", percent);
+				obj__printf(obj, "%7.2f ", percent);
 			}
 		}
 	} else {
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 9ba772f46270..64e70d716ff1 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -339,7 +339,7 @@ static inline int annotation__cycles_width(struct annotation *notes)
 
 static inline int annotation__pcnt_width(struct annotation *notes)
 {
-	return (symbol_conf.show_total_period ? 12 : 7) * notes->src->nr_events;
+	return (symbol_conf.show_total_period ? 12 : 8) * notes->src->nr_events;
 }
 
 static inline bool annotation_line__filter(struct annotation_line *al)
-- 
2.46.0.rc2.264.g509ed76dc8-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events
  2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
                   ` (2 preceding siblings ...)
  2024-08-03 21:13 ` [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently Namhyung Kim
@ 2024-08-03 21:13 ` Namhyung Kim
  2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
  2024-08-05 19:26 ` [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Arnaldo Carvalho de Melo
  5 siblings, 0 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users

This is a preparation to support skipping empty events.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/disasm.c | 6 ++----
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/disasm.c b/tools/perf/util/disasm.c
index 85fb0cfedf94..22289003e16d 100644
--- a/tools/perf/util/disasm.c
+++ b/tools/perf/util/disasm.c
@@ -1037,10 +1037,8 @@ static size_t disasm_line_size(int nr)
 struct disasm_line *disasm_line__new(struct annotate_args *args)
 {
 	struct disasm_line *dl = NULL;
-	int nr = 1;
-
-	if (evsel__is_group_event(args->evsel))
-		nr = args->evsel->core.nr_members;
+	struct annotation *notes = symbol__annotation(args->ms.sym);
+	int nr = notes->src->nr_events;
 
 	dl = zalloc(disasm_line_size(nr));
 	if (!dl)
-- 
2.46.0.rc2.264.g509ed76dc8-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/5] perf annotate: Add --skip-empty option
  2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
                   ` (3 preceding siblings ...)
  2024-08-03 21:13 ` [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events Namhyung Kim
@ 2024-08-03 21:13 ` Namhyung Kim
  2024-08-05 19:22   ` Arnaldo Carvalho de Melo
  2024-08-05 19:26   ` [PATCH 5/5] perf annotate: Add --skip-empty option Arnaldo Carvalho de Melo
  2024-08-05 19:26 ` [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Arnaldo Carvalho de Melo
  5 siblings, 2 replies; 16+ messages in thread
From: Namhyung Kim @ 2024-08-03 21:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users

Like in perf report, we want to hide empty events in the perf annotate
output.  This is consistent when the option is set in perf report.

For example, the following command would use 3 events including dummy.

  $ perf mem record -a -- perf test -w noploop

  $ perf evlist
  cpu/mem-loads,ldlat=30/P
  cpu/mem-stores/P
  dummy:u

Just using perf annotate with --group will show the all 3 events.

  $ perf annotate --group --stdio | head
   Percent                 |	Source code & Disassembly of ...
  --------------------------------------------------------------
                           : 0     0xe060 <_dl_relocate_object>:
      0.00    0.00    0.00 :    e060:       pushq   %rbp
      0.00    0.00    0.00 :    e061:       movq    %rsp, %rbp
      0.00    0.00    0.00 :    e064:       pushq   %r15
      0.00    0.00    0.00 :    e066:       movq    %rdi, %r15
      0.00    0.00    0.00 :    e069:       pushq   %r14
      0.00    0.00    0.00 :    e06b:       pushq   %r13
      0.00    0.00    0.00 :    e06d:       movl    %edx, %r13d

Now with --skip-empty, it'll hide the last dummy event.

  $ perf annotate --group --stdio --skip-empty | head
   Percent         |	Source code & Disassembly of ...
  ------------------------------------------------------
                   : 0     0xe060 <_dl_relocate_object>:
      0.00    0.00 :    e060:       pushq   %rbp
      0.00    0.00 :    e061:       movq    %rsp, %rbp
      0.00    0.00 :    e064:       pushq   %r15
      0.00    0.00 :    e066:       movq    %rdi, %r15
      0.00    0.00 :    e069:       pushq   %r14
      0.00    0.00 :    e06b:       pushq   %r13
      0.00    0.00 :    e06d:       movl    %edx, %r13d

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/Documentation/perf-annotate.txt |  3 +++
 tools/perf/builtin-annotate.c              |  2 ++
 tools/perf/util/annotate.c                 | 22 +++++++++++++++++-----
 3 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt
index b95524bea021..156c5f37b051 100644
--- a/tools/perf/Documentation/perf-annotate.txt
+++ b/tools/perf/Documentation/perf-annotate.txt
@@ -165,6 +165,9 @@ include::itrace.txt[]
 --type-stat::
 	Show stats for the data type annotation.
 
+--skip-empty::
+	Do not display empty (or dummy) events.
+
 
 SEE ALSO
 --------
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index cf60392b1c19..efcadb7620b8 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -795,6 +795,8 @@ int cmd_annotate(int argc, const char **argv)
 		    "Show stats for the data type annotation"),
 	OPT_BOOLEAN(0, "insn-stat", &annotate.insn_stat,
 		    "Show instruction stats for the data type annotation"),
+	OPT_BOOLEAN(0, "skip-empty", &symbol_conf.skip_empty,
+		    "Do not display empty (or dummy) events in the output"),
 	OPT_END()
 	};
 	int ret;
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 917897fe44a2..eafe8d65052e 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -848,6 +848,10 @@ static void annotation__calc_percent(struct annotation *notes,
 
 			BUG_ON(i >= al->data_nr);
 
+			if (symbol_conf.skip_empty &&
+			    evsel__hists(evsel)->stats.nr_samples == 0)
+				continue;
+
 			data = &al->data[i++];
 
 			calc_percent(notes, evsel, data, al->offset, end);
@@ -901,7 +905,7 @@ int symbol__annotate(struct map_symbol *ms, struct evsel *evsel,
 		.options	= &annotate_opts,
 	};
 	struct arch *arch = NULL;
-	int err;
+	int err, nr;
 
 	err = evsel__get_arch(evsel, &arch);
 	if (err < 0)
@@ -922,10 +926,18 @@ int symbol__annotate(struct map_symbol *ms, struct evsel *evsel,
 			return -1;
 	}
 
-	if (evsel__is_group_event(evsel))
-		notes->src->nr_events = evsel->core.nr_members;
-	else
-		notes->src->nr_events = 1;
+	nr = 0;
+	if (evsel__is_group_event(evsel)) {
+		struct evsel *pos;
+
+		for_each_group_evsel(pos, evsel) {
+			if (symbol_conf.skip_empty &&
+			    evsel__hists(pos)->stats.nr_samples == 0)
+				continue;
+			nr++;
+		}
+	}
+	notes->src->nr_events = nr ? nr : 1;
 
 	if (annotate_opts.full_addr)
 		notes->src->start = map__objdump_2mem(ms->map, ms->sym->start);
-- 
2.46.0.rc2.264.g509ed76dc8-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
  2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
@ 2024-08-05 19:22   ` Arnaldo Carvalho de Melo
  2024-08-05 20:14     ` Namhyung Kim
  2024-08-05 19:26   ` [PATCH 5/5] perf annotate: Add --skip-empty option Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-05 19:22 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users

On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> Like in perf report, we want to hide empty events in the perf annotate
> output.  This is consistent when the option is set in perf report.
> 
> For example, the following command would use 3 events including dummy.
> 
>   $ perf mem record -a -- perf test -w noploop
> 
>   $ perf evlist
>   cpu/mem-loads,ldlat=30/P
>   cpu/mem-stores/P
>   dummy:u
> 
> Just using perf annotate with --group will show the all 3 events.

Seems unrelated, just before compiling with this patch:

root@x1:~# perf mem record -a -- perf test -w noploop
Memory events are enabled on a subset of CPUs: 4-11
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
root@x1:~#

root@x1:~# perf annotate --group --stdio2 sched_clock
root@x1:~# perf annotate --stdio2 sched_clock
Samples: 178  of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent      0xffffffff810511e0 <sched_clock>:
               endbr64        
   5.76        incl    pcpu_hot+0x8
   5.47      → callq   sched_clock_noinstr
  88.78        decl    pcpu_hot+0x8
             ↓ je      1e     
             → jmp     __x86_return_thunk
         1e: → callq   __SCT__preempt_schedule_notrace
             → jmp     __x86_return_thunk
root@x1:~# perf annotate --group --stdio2 sched_clock
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~# perf annotate --group sched_clock
root@x1:~#

root@x1:~# perf evlist
cpu_atom/mem-loads,ldlat=30/P
cpu_atom/mem-stores/P
dummy:u
root@x1:~#

root@x1:~# perf report --header-only | grep cmdline
# cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop 
root@x1:~#

I thought it would be some hybrid oddity but seems to be just --group
related, seems like it stops if the first event has no samples? Because
it works with another symbol:

root@x1:~# perf annotate --group --stdio2 do_lookup_x | head -25
Samples: 20  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 769079, [percent: local period]
do_lookup_x() /usr/lib64/ld-linux-x86-64.so.2
Percent                       0x9900 <do_lookup_x>:       
                                pushq      %rbp                 
                                movq       %rsp,%rbp            
                                pushq      %r15                 
                                pushq      %r14                 
                                pushq      %r13                 
                                pushq      %r12                 
                                pushq      %rbx                 
                                subq       $0x88,%rsp           
                                movq       %rdi,-0x50(%rbp)     
                                movl       8(%r9),%edi          
                                movq       0x10(%rbp),%r12      
                                movq       0x28(%rbp),%r10      
                                movq       %rdx,-0x70(%rbp)     
                                movq       %rcx,-0x58(%rbp)     
                                movq       %rdi,%r11            
   0.00    5.73    0.00         movq       %r8,-0x68(%rbp)      
                                movq       (%r9),%r8            
                                movl       %esi,%eax            
   8.30    0.00    0.00         movl       0x30(%rbp),%r9d      
                                movl       %esi,%r15d           
                                shrl       $6, %eax             
                                movq       %r8,%r13             
root@x1:~#

Just leaving a note here, no time to fully investigate this now,

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
  2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
  2024-08-05 19:22   ` Arnaldo Carvalho de Melo
@ 2024-08-05 19:26   ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-05 19:26 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users

On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> Like in perf report, we want to hide empty events in the perf annotate
> output.  This is consistent when the option is set in perf report.
> 
> For example, the following command would use 3 events including dummy.

The option --skip-empty is useful, but I wonder if for "dummy" it
shouldn't be the default, i.e. a per-event "skip" or "hide" flag that we
would set for the "dummy" event in addition to this --skip-empty command
line option?

- Arnaldo
 
>   $ perf mem record -a -- perf test -w noploop
> 
>   $ perf evlist
>   cpu/mem-loads,ldlat=30/P
>   cpu/mem-stores/P
>   dummy:u
> 
> Just using perf annotate with --group will show the all 3 events.
> 
>   $ perf annotate --group --stdio | head
>    Percent                 |	Source code & Disassembly of ...
>   --------------------------------------------------------------
>                            : 0     0xe060 <_dl_relocate_object>:
>       0.00    0.00    0.00 :    e060:       pushq   %rbp
>       0.00    0.00    0.00 :    e061:       movq    %rsp, %rbp
>       0.00    0.00    0.00 :    e064:       pushq   %r15
>       0.00    0.00    0.00 :    e066:       movq    %rdi, %r15
>       0.00    0.00    0.00 :    e069:       pushq   %r14
>       0.00    0.00    0.00 :    e06b:       pushq   %r13
>       0.00    0.00    0.00 :    e06d:       movl    %edx, %r13d
> 
> Now with --skip-empty, it'll hide the last dummy event.
> 
>   $ perf annotate --group --stdio --skip-empty | head
>    Percent         |	Source code & Disassembly of ...
>   ------------------------------------------------------
>                    : 0     0xe060 <_dl_relocate_object>:
>       0.00    0.00 :    e060:       pushq   %rbp
>       0.00    0.00 :    e061:       movq    %rsp, %rbp
>       0.00    0.00 :    e064:       pushq   %r15
>       0.00    0.00 :    e066:       movq    %rdi, %r15
>       0.00    0.00 :    e069:       pushq   %r14
>       0.00    0.00 :    e06b:       pushq   %r13
>       0.00    0.00 :    e06d:       movl    %edx, %r13d
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/Documentation/perf-annotate.txt |  3 +++
>  tools/perf/builtin-annotate.c              |  2 ++
>  tools/perf/util/annotate.c                 | 22 +++++++++++++++++-----
>  3 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt
> index b95524bea021..156c5f37b051 100644
> --- a/tools/perf/Documentation/perf-annotate.txt
> +++ b/tools/perf/Documentation/perf-annotate.txt
> @@ -165,6 +165,9 @@ include::itrace.txt[]
>  --type-stat::
>  	Show stats for the data type annotation.
>  
> +--skip-empty::
> +	Do not display empty (or dummy) events.
> +
>  
>  SEE ALSO
>  --------
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index cf60392b1c19..efcadb7620b8 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -795,6 +795,8 @@ int cmd_annotate(int argc, const char **argv)
>  		    "Show stats for the data type annotation"),
>  	OPT_BOOLEAN(0, "insn-stat", &annotate.insn_stat,
>  		    "Show instruction stats for the data type annotation"),
> +	OPT_BOOLEAN(0, "skip-empty", &symbol_conf.skip_empty,
> +		    "Do not display empty (or dummy) events in the output"),
>  	OPT_END()
>  	};
>  	int ret;
> diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
> index 917897fe44a2..eafe8d65052e 100644
> --- a/tools/perf/util/annotate.c
> +++ b/tools/perf/util/annotate.c
> @@ -848,6 +848,10 @@ static void annotation__calc_percent(struct annotation *notes,
>  
>  			BUG_ON(i >= al->data_nr);
>  
> +			if (symbol_conf.skip_empty &&
> +			    evsel__hists(evsel)->stats.nr_samples == 0)
> +				continue;
> +
>  			data = &al->data[i++];
>  
>  			calc_percent(notes, evsel, data, al->offset, end);
> @@ -901,7 +905,7 @@ int symbol__annotate(struct map_symbol *ms, struct evsel *evsel,
>  		.options	= &annotate_opts,
>  	};
>  	struct arch *arch = NULL;
> -	int err;
> +	int err, nr;
>  
>  	err = evsel__get_arch(evsel, &arch);
>  	if (err < 0)
> @@ -922,10 +926,18 @@ int symbol__annotate(struct map_symbol *ms, struct evsel *evsel,
>  			return -1;
>  	}
>  
> -	if (evsel__is_group_event(evsel))
> -		notes->src->nr_events = evsel->core.nr_members;
> -	else
> -		notes->src->nr_events = 1;
> +	nr = 0;
> +	if (evsel__is_group_event(evsel)) {
> +		struct evsel *pos;
> +
> +		for_each_group_evsel(pos, evsel) {
> +			if (symbol_conf.skip_empty &&
> +			    evsel__hists(pos)->stats.nr_samples == 0)
> +				continue;
> +			nr++;
> +		}
> +	}
> +	notes->src->nr_events = nr ? nr : 1;
>  
>  	if (annotate_opts.full_addr)
>  		notes->src->start = map__objdump_2mem(ms->map, ms->sym->start);
> -- 
> 2.46.0.rc2.264.g509ed76dc8-goog

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1)
  2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
                   ` (4 preceding siblings ...)
  2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
@ 2024-08-05 19:26 ` Arnaldo Carvalho de Melo
  5 siblings, 0 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-05 19:26 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users

On Sat, Aug 03, 2024 at 02:13:27PM -0700, Namhyung Kim wrote:
> Hello,
> 
> This is to make perf annotate has the same behavior as perf report.
> Especially in the TUI browser, we want to maintain the same experience
> when it comes to display dummy events from perf report.
> 
>   $ perf mem record -a -- perf test -w noploop
> 
>   $ perf evlist
>   cpu/mem-loads,ldlat=30/P
>   cpu/mem-stores/P
>   dummy:u

Thanks, tested and applied to tmp.perf-tools-next will go to
perf-tools-next later.

- Arnaldo
 
> Just using perf annotate with --group will show the all 3 events.
> 
>   $ perf annotate --group --stdio | head
>    Percent                 |    Source code & Disassembly of ...
>   --------------------------------------------------------------
>                            : 0     0xe060 <_dl_relocate_object>:
>       0.00    0.00    0.00 :    e060:       pushq   %rbp
>       0.00    0.00    0.00 :    e061:       movq    %rsp, %rbp
>       0.00    0.00    0.00 :    e064:       pushq   %r15
>       0.00    0.00    0.00 :    e066:       movq    %rdi, %r15
>       0.00    0.00    0.00 :    e069:       pushq   %r14
>       0.00    0.00    0.00 :    e06b:       pushq   %r13
>       0.00    0.00    0.00 :    e06d:       movl    %edx, %r13d
> 
> Now with --skip-empty, it'll hide the last dummy event.
> 
>   $ perf annotate --group --stdio --skip-empty | head
>    Percent         |    Source code & Disassembly of ...
>   ------------------------------------------------------
>                    : 0     0xe060 <_dl_relocate_object>:
>       0.00    0.00 :    e060:       pushq   %rbp
>       0.00    0.00 :    e061:       movq    %rsp, %rbp
>       0.00    0.00 :    e064:       pushq   %r15
>       0.00    0.00 :    e066:       movq    %rdi, %r15
>       0.00    0.00 :    e069:       pushq   %r14
>       0.00    0.00 :    e06b:       pushq   %r13
>       0.00    0.00 :    e06d:       movl    %edx, %r13d
> 
> The code is available in 'perf/annotate-skip-v1' branch at
> git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> Thanks,
> Namhyung
> 
> 
> Namhyung Kim (5):
>   perf annotate: Use al->data_nr if possible
>   perf annotate: Set notes->src->nr_events early
>   perf annotate: Use annotation__pcnt_width() consistently
>   perf annotate: Set al->data_nr using the notes->src->nr_events
>   perf annotate: Add --skip-empty option
> 
>  tools/perf/Documentation/perf-annotate.txt |  3 ++
>  tools/perf/builtin-annotate.c              |  2 +
>  tools/perf/util/annotate.c                 | 47 +++++++++++++---------
>  tools/perf/util/annotate.h                 |  2 +-
>  tools/perf/util/disasm.c                   |  6 +--
>  5 files changed, 35 insertions(+), 25 deletions(-)
> 
> -- 
> 2.46.0.rc2.264.g509ed76dc8-goog
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
  2024-08-05 19:22   ` Arnaldo Carvalho de Melo
@ 2024-08-05 20:14     ` Namhyung Kim
  2024-08-05 20:23       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2024-08-05 20:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users

On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > Like in perf report, we want to hide empty events in the perf annotate
> > output.  This is consistent when the option is set in perf report.
> > 
> > For example, the following command would use 3 events including dummy.
> > 
> >   $ perf mem record -a -- perf test -w noploop
> > 
> >   $ perf evlist
> >   cpu/mem-loads,ldlat=30/P
> >   cpu/mem-stores/P
> >   dummy:u
> > 
> > Just using perf annotate with --group will show the all 3 events.
> 
> Seems unrelated, just before compiling with this patch:
> 
> root@x1:~# perf mem record -a -- perf test -w noploop
> Memory events are enabled on a subset of CPUs: 4-11
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> root@x1:~#
> 
> root@x1:~# perf annotate --group --stdio2 sched_clock
> root@x1:~# perf annotate --stdio2 sched_clock
> Samples: 178  of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> Percent      0xffffffff810511e0 <sched_clock>:
>                endbr64        
>    5.76        incl    pcpu_hot+0x8
>    5.47      → callq   sched_clock_noinstr
>   88.78        decl    pcpu_hot+0x8
>              ↓ je      1e     
>              → jmp     __x86_return_thunk
>          1e: → callq   __SCT__preempt_schedule_notrace
>              → jmp     __x86_return_thunk
> root@x1:~# perf annotate --group --stdio2 sched_clock
> root@x1:~# perf annotate --group --stdio sched_clock
> root@x1:~# perf annotate --group sched_clock
> root@x1:~#
> 
> root@x1:~# perf evlist
> cpu_atom/mem-loads,ldlat=30/P
> cpu_atom/mem-stores/P
> dummy:u
> root@x1:~#
> 
> root@x1:~# perf report --header-only | grep cmdline
> # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop 
> root@x1:~#
> 
> I thought it would be some hybrid oddity but seems to be just --group
> related, seems like it stops if the first event has no samples? Because
> it works with another symbol:

Good catch.  Yeah I found it only checked the first event.  Something
like below should fix the issue.

Thanks,
Namhyung


diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index efcadb7620b8..8d3ec439b783 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -632,11 +632,15 @@ static int __cmd_annotate(struct perf_annotate *ann)
 	evlist__for_each_entry(session->evlist, pos) {
 		struct hists *hists = evsel__hists(pos);
 		u32 nr_samples = hists->stats.nr_samples;
+		struct evsel *evsel;
 
-		if (nr_samples == 0)
+		if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
 			continue;
 
-		if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
+		for_each_group_member(evsel, pos)
+			nr_samples += evsel__hists(evsel)->stats.nr_samples;
+
+		if (nr_samples == 0)
 			continue;
 
 		hists__find_annotations(hists, pos, ann);

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
  2024-08-05 20:14     ` Namhyung Kim
@ 2024-08-05 20:23       ` Arnaldo Carvalho de Melo
  2024-08-05 20:50         ` Namhyung Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-05 20:23 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users

On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > Like in perf report, we want to hide empty events in the perf annotate
> > > output.  This is consistent when the option is set in perf report.
> > > 
> > > For example, the following command would use 3 events including dummy.
> > > 
> > >   $ perf mem record -a -- perf test -w noploop
> > > 
> > >   $ perf evlist
> > >   cpu/mem-loads,ldlat=30/P
> > >   cpu/mem-stores/P
> > >   dummy:u
> > > 
> > > Just using perf annotate with --group will show the all 3 events.
> > 
> > Seems unrelated, just before compiling with this patch:
> > 
> > root@x1:~# perf mem record -a -- perf test -w noploop
> > Memory events are enabled on a subset of CPUs: 4-11
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > root@x1:~#
> > 
> > root@x1:~# perf annotate --group --stdio2 sched_clock
> > root@x1:~# perf annotate --stdio2 sched_clock
> > Samples: 178  of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > Percent      0xffffffff810511e0 <sched_clock>:
> >                endbr64        
> >    5.76        incl    pcpu_hot+0x8
> >    5.47      → callq   sched_clock_noinstr
> >   88.78        decl    pcpu_hot+0x8
> >              ↓ je      1e     
> >              → jmp     __x86_return_thunk
> >          1e: → callq   __SCT__preempt_schedule_notrace
> >              → jmp     __x86_return_thunk
> > root@x1:~# perf annotate --group --stdio2 sched_clock
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --group sched_clock
> > root@x1:~#
> > 
> > root@x1:~# perf evlist
> > cpu_atom/mem-loads,ldlat=30/P
> > cpu_atom/mem-stores/P
> > dummy:u
> > root@x1:~#
> > 
> > root@x1:~# perf report --header-only | grep cmdline
> > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop 
> > root@x1:~#
> > 
> > I thought it would be some hybrid oddity but seems to be just --group
> > related, seems like it stops if the first event has no samples? Because
> > it works with another symbol:
> 
> Good catch.  Yeah I found it only checked the first event.  Something
> like below should fix the issue.

Nope, with the patch applied:

root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~# perf annotate --stdio sched_clock
 Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------
         : 0                0xffffffff810511e0 <sched_clock>:
    0.00 :   ffffffff810511e0:       endbr64
    5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
    0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
   94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
    0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
    0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
    0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
    0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~# perf annotate --group --stdio2 sched_clock
root@x1:~# perf annotate --group sched_clock
root@x1:~#
 
> Thanks,
> Namhyung
> 
> 
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index efcadb7620b8..8d3ec439b783 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -632,11 +632,15 @@ static int __cmd_annotate(struct perf_annotate *ann)
>  	evlist__for_each_entry(session->evlist, pos) {
>  		struct hists *hists = evsel__hists(pos);
>  		u32 nr_samples = hists->stats.nr_samples;
> +		struct evsel *evsel;
>  
> -		if (nr_samples == 0)
> +		if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
>  			continue;
>  
> -		if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
> +		for_each_group_member(evsel, pos)
> +			nr_samples += evsel__hists(evsel)->stats.nr_samples;
> +
> +		if (nr_samples == 0)
>  			continue;
>  
>  		hists__find_annotations(hists, pos, ann);

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
  2024-08-05 20:23       ` Arnaldo Carvalho de Melo
@ 2024-08-05 20:50         ` Namhyung Kim
  2024-08-06 13:12           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2024-08-05 20:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users

On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > output.  This is consistent when the option is set in perf report.
> > > > 
> > > > For example, the following command would use 3 events including dummy.
> > > > 
> > > >   $ perf mem record -a -- perf test -w noploop
> > > > 
> > > >   $ perf evlist
> > > >   cpu/mem-loads,ldlat=30/P
> > > >   cpu/mem-stores/P
> > > >   dummy:u
> > > > 
> > > > Just using perf annotate with --group will show the all 3 events.
> > > 
> > > Seems unrelated, just before compiling with this patch:
> > > 
> > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > Memory events are enabled on a subset of CPUs: 4-11
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > root@x1:~#
> > > 
> > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > root@x1:~# perf annotate --stdio2 sched_clock
> > > Samples: 178  of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > Percent      0xffffffff810511e0 <sched_clock>:
> > >                endbr64        
> > >    5.76        incl    pcpu_hot+0x8
> > >    5.47      → callq   sched_clock_noinstr
> > >   88.78        decl    pcpu_hot+0x8
> > >              ↓ je      1e     
> > >              → jmp     __x86_return_thunk
> > >          1e: → callq   __SCT__preempt_schedule_notrace
> > >              → jmp     __x86_return_thunk
> > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > root@x1:~# perf annotate --group --stdio sched_clock
> > > root@x1:~# perf annotate --group sched_clock
> > > root@x1:~#
> > > 
> > > root@x1:~# perf evlist
> > > cpu_atom/mem-loads,ldlat=30/P
> > > cpu_atom/mem-stores/P
> > > dummy:u
> > > root@x1:~#
> > > 
> > > root@x1:~# perf report --header-only | grep cmdline
> > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop 
> > > root@x1:~#
> > > 
> > > I thought it would be some hybrid oddity but seems to be just --group
> > > related, seems like it stops if the first event has no samples? Because
> > > it works with another symbol:
> > 
> > Good catch.  Yeah I found it only checked the first event.  Something
> > like below should fix the issue.
> 
> Nope, with the patch applied:
> 
> root@x1:~# perf annotate --group --stdio sched_clock
> root@x1:~# perf annotate --stdio sched_clock
>  Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> -------------------------------------------------------------------------------------------------------------------
>          : 0                0xffffffff810511e0 <sched_clock>:
>     0.00 :   ffffffff810511e0:       endbr64
>     5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
>     0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
>    94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
>     0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
>     0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
>     0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
>     0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
> root@x1:~# perf annotate --group --stdio sched_clock
> root@x1:~# perf annotate --group --stdio2 sched_clock
> root@x1:~# perf annotate --group sched_clock
> root@x1:~#

Oh ok, it was not enough.  It should call evsel__output_resort() after
hists__match() and hists__link().  Use this instead.

Thanks,
Namhyung


diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index efcadb7620b8..1bfe41783a7c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -632,13 +632,23 @@ static int __cmd_annotate(struct perf_annotate *ann)
 	evlist__for_each_entry(session->evlist, pos) {
 		struct hists *hists = evsel__hists(pos);
 		u32 nr_samples = hists->stats.nr_samples;
+		struct ui_progress prog;
+		struct evsel *evsel;
 
-		if (nr_samples == 0)
+		if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
 			continue;
 
-		if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
+		for_each_group_member(evsel, pos)
+			nr_samples += evsel__hists(evsel)->stats.nr_samples;
+
+		if (nr_samples == 0)
 			continue;
 
+		ui_progress__init(&prog, nr_samples,
+				  "Sorting group events for output...");
+		evsel__output_resort(pos, &prog);
+		ui_progress__finish();
+
 		hists__find_annotations(hists, pos, ann);
 	}
 

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
  2024-08-05 20:50         ` Namhyung Kim
@ 2024-08-06 13:12           ` Arnaldo Carvalho de Melo
  2024-08-07  6:12             ` Namhyung Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-06 13:12 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users

On Mon, Aug 05, 2024 at 01:50:25PM -0700, Namhyung Kim wrote:
> On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > > output.  This is consistent when the option is set in perf report.
> > > > > 
> > > > > For example, the following command would use 3 events including dummy.
> > > > > 
> > > > >   $ perf mem record -a -- perf test -w noploop
> > > > > 
> > > > >   $ perf evlist
> > > > >   cpu/mem-loads,ldlat=30/P
> > > > >   cpu/mem-stores/P
> > > > >   dummy:u
> > > > > 
> > > > > Just using perf annotate with --group will show the all 3 events.
> > > > 
> > > > Seems unrelated, just before compiling with this patch:
> > > > 
> > > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > > Memory events are enabled on a subset of CPUs: 4-11
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > > root@x1:~#
> > > > 
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --stdio2 sched_clock
> > > > Samples: 178  of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > > Percent      0xffffffff810511e0 <sched_clock>:
> > > >                endbr64        
> > > >    5.76        incl    pcpu_hot+0x8
> > > >    5.47      → callq   sched_clock_noinstr
> > > >   88.78        decl    pcpu_hot+0x8
> > > >              ↓ je      1e     
> > > >              → jmp     __x86_return_thunk
> > > >          1e: → callq   __SCT__preempt_schedule_notrace
> > > >              → jmp     __x86_return_thunk
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --group --stdio sched_clock
> > > > root@x1:~# perf annotate --group sched_clock
> > > > root@x1:~#
> > > > 
> > > > root@x1:~# perf evlist
> > > > cpu_atom/mem-loads,ldlat=30/P
> > > > cpu_atom/mem-stores/P
> > > > dummy:u
> > > > root@x1:~#
> > > > 
> > > > root@x1:~# perf report --header-only | grep cmdline
> > > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop 
> > > > root@x1:~#
> > > > 
> > > > I thought it would be some hybrid oddity but seems to be just --group
> > > > related, seems like it stops if the first event has no samples? Because
> > > > it works with another symbol:
> > > 
> > > Good catch.  Yeah I found it only checked the first event.  Something
> > > like below should fix the issue.
> > 
> > Nope, with the patch applied:
> > 
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --stdio sched_clock
> >  Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> > -------------------------------------------------------------------------------------------------------------------
> >          : 0                0xffffffff810511e0 <sched_clock>:
> >     0.00 :   ffffffff810511e0:       endbr64
> >     5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
> >     0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
> >    94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
> >     0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
> >     0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
> >     0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
> >     0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --group --stdio2 sched_clock
> > root@x1:~# perf annotate --group sched_clock
> > root@x1:~#
> 
> Oh ok, it was not enough.  It should call evsel__output_resort() after
> hists__match() and hists__link().  Use this instead.

Ok, this works:

Before this patch:

root@x1:~# perf annotate --stdio sched_clock
 Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------
         : 0                0xffffffff810511e0 <sched_clock>:
    0.00 :   ffffffff810511e0:       endbr64
    5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
    0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
   94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
    0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
    0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
    0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
    0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~#

After:

root@x1:~# perf annotate --group --stdio sched_clock
 Percent                 |      Source code & Disassembly of vmlinux for cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u (0 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         : 0                0xffffffff810511e0 <sched_clock>:
    0.00    0.00    0.00 :   ffffffff810511e0:       endbr64
    0.00    5.11    0.00 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
    0.00    0.13    0.00 :   ffffffff810511eb:       callq   0xffffffff821350d0
    0.00   94.76    0.00 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
    0.00    0.00    0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
    0.00    0.00    0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
    0.00    0.00    0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
    0.00    0.00    0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
root@x1:~#

One example with samples for the first two events:

root@x1:~# perf annotate --group --stdio2
Samples: 2K of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 22892183, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent                       0xffffffff8124e080 <cgroup_rstat_updated>:
   0.00    0.24    0.00         endbr64                         
                              → callq   __fentry__              
   0.00   99.76    0.00         pushq   %r15                    
                                movq    $0x251d4,%rcx           
                                pushq   %r14                    
                                movq    %rdi,%r14               
                                pushq   %r13                    
                                movslq  %esi,%r13               
                                pushq   %r12                    
                                pushq   %rbp                    
                                pushq   %rbx                    
                                subq    $0x10,%rsp              
                                cmpq    $0x2000,%r13            
                              ↓ jae     17f                     
                          31:   movq    0x3d0(%r14),%rbx        
                                movq    -0x7d3fb360(, %r13, 8),%r12
                                cmpq    $0x2000,%r13            
                              ↓ jae     19b                     
  25.00    0.00    0.00   4d:   cmpq    $0,0x88(%r12, %rbx)     
                              ↓ je      6b                      
                                addq    $0x10,%rsp              
                                popq    %rbx                    
                                popq    %rbp                    
                                popq    %r12                    
  75.00    0.00    0.00         popq    %r13                    
                                popq    %r14                    
                                popq    %r15                    
                              → jmp     __x86_return_thunk      
<SNIP>

And then skipping "empty" events:

root@x1:~# perf annotate --group --skip-empty --stdio2 cgroup_rstat_updated | head -35
Samples: 4  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 31851, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent               0xffffffff8124e080 <cgroup_rstat_updated>:
   0.00    0.24         endbr64                 
                      → callq   __fentry__      
   0.00   99.76         pushq   %r15            
                        movq    $0x251d4,%rcx   
                        pushq   %r14            
                        movq    %rdi,%r14       
                        pushq   %r13            
                        movslq  %esi,%r13       
                        pushq   %r12            
                        pushq   %rbp            
                        pushq   %rbx            
                        subq    $0x10,%rsp      
                        cmpq    $0x2000,%r13    
                      ↓ jae     17f             
                  31:   movq    0x3d0(%r14),%rbx
                        movq    -0x7d3fb360(, %r13, 8),%r12
                        cmpq    $0x2000,%r13    
                      ↓ jae     19b             
  25.00    0.00   4d:   cmpq    $0,0x88(%r12, %rbx)
                      ↓ je      6b              
                        addq    $0x10,%rsp      
                        popq    %rbx            
                        popq    %rbp            
                        popq    %r12            
  75.00    0.00         popq    %r13            
                        popq    %r14            
                        popq    %r15            
                      → jmp     __x86_return_thunk
                  6b:   addq    %r12,%rcx       
                        movq    %rcx,%rdi       
                        movq    %rcx,(%rsp)     
                      → callq   *ffffffff82151500
root@x1:~#

So, I haven't done further analysis but I think this is a separate
issue, right?

Thanks for the fix!

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

- Arnaldo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/5] perf annotate: Add --skip-empty option
  2024-08-06 13:12           ` Arnaldo Carvalho de Melo
@ 2024-08-07  6:12             ` Namhyung Kim
  2024-08-07  6:15               ` [PATCH] perf annotate: Fix --group behavior when leader has no samples Namhyung Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2024-08-07  6:12 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users

On Tue, Aug 6, 2024 at 6:12 AM Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> On Mon, Aug 05, 2024 at 01:50:25PM -0700, Namhyung Kim wrote:
> > On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > > On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > > > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > > > output.  This is consistent when the option is set in perf report.
> > > > > >
> > > > > > For example, the following command would use 3 events including dummy.
> > > > > >
> > > > > >   $ perf mem record -a -- perf test -w noploop
> > > > > >
> > > > > >   $ perf evlist
> > > > > >   cpu/mem-loads,ldlat=30/P
> > > > > >   cpu/mem-stores/P
> > > > > >   dummy:u
> > > > > >
> > > > > > Just using perf annotate with --group will show the all 3 events.
> > > > >
> > > > > Seems unrelated, just before compiling with this patch:
> > > > >
> > > > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > > > Memory events are enabled on a subset of CPUs: 4-11
> > > > > [ perf record: Woken up 1 times to write data ]
> > > > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > > > root@x1:~#
> > > > >
> > > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > > root@x1:~# perf annotate --stdio2 sched_clock
> > > > > Samples: 178  of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > > > Percent      0xffffffff810511e0 <sched_clock>:
> > > > >                endbr64
> > > > >    5.76        incl    pcpu_hot+0x8
> > > > >    5.47      → callq   sched_clock_noinstr
> > > > >   88.78        decl    pcpu_hot+0x8
> > > > >              ↓ je      1e
> > > > >              → jmp     __x86_return_thunk
> > > > >          1e: → callq   __SCT__preempt_schedule_notrace
> > > > >              → jmp     __x86_return_thunk
> > > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > > root@x1:~# perf annotate --group --stdio sched_clock
> > > > > root@x1:~# perf annotate --group sched_clock
> > > > > root@x1:~#
> > > > >
> > > > > root@x1:~# perf evlist
> > > > > cpu_atom/mem-loads,ldlat=30/P
> > > > > cpu_atom/mem-stores/P
> > > > > dummy:u
> > > > > root@x1:~#
> > > > >
> > > > > root@x1:~# perf report --header-only | grep cmdline
> > > > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop
> > > > > root@x1:~#
> > > > >
> > > > > I thought it would be some hybrid oddity but seems to be just --group
> > > > > related, seems like it stops if the first event has no samples? Because
> > > > > it works with another symbol:
> > > >
> > > > Good catch.  Yeah I found it only checked the first event.  Something
> > > > like below should fix the issue.
> > >
> > > Nope, with the patch applied:
> > >
> > > root@x1:~# perf annotate --group --stdio sched_clock
> > > root@x1:~# perf annotate --stdio sched_clock
> > >  Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> > > -------------------------------------------------------------------------------------------------------------------
> > >          : 0                0xffffffff810511e0 <sched_clock>:
> > >     0.00 :   ffffffff810511e0:       endbr64
> > >     5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
> > >     0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
> > >    94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
> > >     0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
> > >     0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
> > >     0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
> > >     0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
> > > root@x1:~# perf annotate --group --stdio sched_clock
> > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > root@x1:~# perf annotate --group sched_clock
> > > root@x1:~#
> >
> > Oh ok, it was not enough.  It should call evsel__output_resort() after
> > hists__match() and hists__link().  Use this instead.
>
> Ok, this works:
>
> Before this patch:
>
> root@x1:~# perf annotate --stdio sched_clock
>  Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> -------------------------------------------------------------------------------------------------------------------
>          : 0                0xffffffff810511e0 <sched_clock>:
>     0.00 :   ffffffff810511e0:       endbr64
>     5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
>     0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
>    94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
>     0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
>     0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
>     0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
>     0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
> root@x1:~# perf annotate --group --stdio sched_clock
> root@x1:~#
>
> After:
>
> root@x1:~# perf annotate --group --stdio sched_clock
>  Percent                 |      Source code & Disassembly of vmlinux for cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u (0 samples, percent: local period)
> -------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                          : 0                0xffffffff810511e0 <sched_clock>:
>     0.00    0.00    0.00 :   ffffffff810511e0:       endbr64
>     0.00    5.11    0.00 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
>     0.00    0.13    0.00 :   ffffffff810511eb:       callq   0xffffffff821350d0
>     0.00   94.76    0.00 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
>     0.00    0.00    0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
>     0.00    0.00    0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
>     0.00    0.00    0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
>     0.00    0.00    0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
> root@x1:~#
>
> One example with samples for the first two events:
>
> root@x1:~# perf annotate --group --stdio2
> Samples: 2K of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 22892183, [percent: local period]
> cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> Percent                       0xffffffff8124e080 <cgroup_rstat_updated>:
>    0.00    0.24    0.00         endbr64
>                               → callq   __fentry__
>    0.00   99.76    0.00         pushq   %r15
>                                 movq    $0x251d4,%rcx
>                                 pushq   %r14
>                                 movq    %rdi,%r14
>                                 pushq   %r13
>                                 movslq  %esi,%r13
>                                 pushq   %r12
>                                 pushq   %rbp
>                                 pushq   %rbx
>                                 subq    $0x10,%rsp
>                                 cmpq    $0x2000,%r13
>                               ↓ jae     17f
>                           31:   movq    0x3d0(%r14),%rbx
>                                 movq    -0x7d3fb360(, %r13, 8),%r12
>                                 cmpq    $0x2000,%r13
>                               ↓ jae     19b
>   25.00    0.00    0.00   4d:   cmpq    $0,0x88(%r12, %rbx)
>                               ↓ je      6b
>                                 addq    $0x10,%rsp
>                                 popq    %rbx
>                                 popq    %rbp
>                                 popq    %r12
>   75.00    0.00    0.00         popq    %r13
>                                 popq    %r14
>                                 popq    %r15
>                               → jmp     __x86_return_thunk
> <SNIP>
>
> And then skipping "empty" events:
>
> root@x1:~# perf annotate --group --skip-empty --stdio2 cgroup_rstat_updated | head -35
> Samples: 4  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 31851, [percent: local period]
> cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> Percent               0xffffffff8124e080 <cgroup_rstat_updated>:
>    0.00    0.24         endbr64
>                       → callq   __fentry__
>    0.00   99.76         pushq   %r15
>                         movq    $0x251d4,%rcx
>                         pushq   %r14
>                         movq    %rdi,%r14
>                         pushq   %r13
>                         movslq  %esi,%r13
>                         pushq   %r12
>                         pushq   %rbp
>                         pushq   %rbx
>                         subq    $0x10,%rsp
>                         cmpq    $0x2000,%r13
>                       ↓ jae     17f
>                   31:   movq    0x3d0(%r14),%rbx
>                         movq    -0x7d3fb360(, %r13, 8),%r12
>                         cmpq    $0x2000,%r13
>                       ↓ jae     19b
>   25.00    0.00   4d:   cmpq    $0,0x88(%r12, %rbx)
>                       ↓ je      6b
>                         addq    $0x10,%rsp
>                         popq    %rbx
>                         popq    %rbp
>                         popq    %r12
>   75.00    0.00         popq    %r13
>                         popq    %r14
>                         popq    %r15
>                       → jmp     __x86_return_thunk
>                   6b:   addq    %r12,%rcx
>                         movq    %rcx,%rdi
>                         movq    %rcx,(%rsp)
>                       → callq   *ffffffff82151500
> root@x1:~#
>
> So, I haven't done further analysis but I think this is a separate
> issue, right?

Yep, it's not related to --skip-empty.

>
> Thanks for the fix!
>
> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

Will send a fix with your tags.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH] perf annotate: Fix --group behavior when leader has no samples
  2024-08-07  6:12             ` Namhyung Kim
@ 2024-08-07  6:15               ` Namhyung Kim
  2024-08-09 21:15                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2024-08-07  6:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers, Kan Liang
  Cc: Jiri Olsa, Adrian Hunter, Peter Zijlstra, Ingo Molnar, LKML,
	linux-perf-users, Arnaldo Carvalho de Melo

When --group option is used, it should display all events together.  But
the current logic only checks if the first (leader) event has samples or
not.  Let's check the member events as well.

Also it missed to put the linked samples from member evsels to the
output RB-tree so that it can be displayed in the output.

For example, take a look at this example.

  $ ./perf evlist
  cpu/mem-loads,ldlat=30/P
  cpu/mem-stores/P
  dummy:u

It has three events but 'path_put' function has samples only for
mem-stores (second) event.

  $ sudo ./perf annotate --stdio -f path_put
   Percent |      Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period)
  ----------------------------------------------------------------------------------------------------------
           : 0                0xffffffffae600020 <path_put>:
      0.00 :   ffffffffae600020:       endbr64
      0.00 :   ffffffffae600024:       nopl    (%rax, %rax)
     91.22 :   ffffffffae600029:       pushq   %rbx
      0.00 :   ffffffffae60002a:       movq    %rdi, %rbx
      0.00 :   ffffffffae60002d:       movq    8(%rdi), %rdi
      8.78 :   ffffffffae600031:       callq   0xffffffffae614aa0
      0.00 :   ffffffffae600036:       movq    (%rbx), %rdi
      0.00 :   ffffffffae600039:       popq    %rbx
      0.00 :   ffffffffae60003a:       jmp     0xffffffffae620670
      0.00 :   ffffffffae60003f:       nop

Therefore, it didn't show up when --group option is used since the
leader ("mem-loads") event has no samples.  But now it checks both
events.

Before:
  $ sudo ./perf annotate --stdio -f --group path_put
  (no output)

After:
  $ sudo ./perf annotate --stdio -f --group path_put
   Percent                 |      Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period)
  -------------------------------------------------------------------------------------------------------------------------------------------------------------
                           : 0                0xffffffffae600020 <path_put>:
      0.00    0.00    0.00 :   ffffffffae600020:       endbr64
      0.00    0.00    0.00 :   ffffffffae600024:       nopl    (%rax, %rax)
      0.00   91.22    0.00 :   ffffffffae600029:       pushq   %rbx
      0.00    0.00    0.00 :   ffffffffae60002a:       movq    %rdi, %rbx
      0.00    0.00    0.00 :   ffffffffae60002d:       movq    8(%rdi), %rdi
      0.00    8.78    0.00 :   ffffffffae600031:       callq   0xffffffffae614aa0
      0.00    0.00    0.00 :   ffffffffae600036:       movq    (%rbx), %rdi
      0.00    0.00    0.00 :   ffffffffae600039:       popq    %rbx
      0.00    0.00    0.00 :   ffffffffae60003a:       jmp     0xffffffffae620670
      0.00    0.00    0.00 :   ffffffffae60003f:       nop

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-annotate.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index efcadb7620b8..1bfe41783a7c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -632,13 +632,23 @@ static int __cmd_annotate(struct perf_annotate *ann)
 	evlist__for_each_entry(session->evlist, pos) {
 		struct hists *hists = evsel__hists(pos);
 		u32 nr_samples = hists->stats.nr_samples;
+		struct ui_progress prog;
+		struct evsel *evsel;
 
-		if (nr_samples == 0)
+		if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
 			continue;
 
-		if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
+		for_each_group_member(evsel, pos)
+			nr_samples += evsel__hists(evsel)->stats.nr_samples;
+
+		if (nr_samples == 0)
 			continue;
 
+		ui_progress__init(&prog, nr_samples,
+				  "Sorting group events for output...");
+		evsel__output_resort(pos, &prog);
+		ui_progress__finish();
+
 		hists__find_annotations(hists, pos, ann);
 	}
 
-- 
2.46.0.rc2.264.g509ed76dc8-goog


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH] perf annotate: Fix --group behavior when leader has no samples
  2024-08-07  6:15               ` [PATCH] perf annotate: Fix --group behavior when leader has no samples Namhyung Kim
@ 2024-08-09 21:15                 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 16+ messages in thread
From: Arnaldo Carvalho de Melo @ 2024-08-09 21:15 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ian Rogers, Kan Liang, Jiri Olsa, Adrian Hunter, Peter Zijlstra,
	Ingo Molnar, LKML, linux-perf-users, Arnaldo Carvalho de Melo

On Tue, Aug 06, 2024 at 11:15:55PM -0700, Namhyung Kim wrote:
> When --group option is used, it should display all events together.  But
> the current logic only checks if the first (leader) event has samples or
> not.  Let's check the member events as well.
> 
> Also it missed to put the linked samples from member evsels to the
> output RB-tree so that it can be displayed in the output.

Thanks, re-tested and applied.

- Arnaldo

> For example, take a look at this example.
> 
>   $ ./perf evlist
>   cpu/mem-loads,ldlat=30/P
>   cpu/mem-stores/P
>   dummy:u
> 
> It has three events but 'path_put' function has samples only for
> mem-stores (second) event.
> 
>   $ sudo ./perf annotate --stdio -f path_put
>    Percent |      Source code & Disassembly of kcore for cpu/mem-stores/P (2 samples, percent: local period)
>   ----------------------------------------------------------------------------------------------------------
>            : 0                0xffffffffae600020 <path_put>:
>       0.00 :   ffffffffae600020:       endbr64
>       0.00 :   ffffffffae600024:       nopl    (%rax, %rax)
>      91.22 :   ffffffffae600029:       pushq   %rbx
>       0.00 :   ffffffffae60002a:       movq    %rdi, %rbx
>       0.00 :   ffffffffae60002d:       movq    8(%rdi), %rdi
>       8.78 :   ffffffffae600031:       callq   0xffffffffae614aa0
>       0.00 :   ffffffffae600036:       movq    (%rbx), %rdi
>       0.00 :   ffffffffae600039:       popq    %rbx
>       0.00 :   ffffffffae60003a:       jmp     0xffffffffae620670
>       0.00 :   ffffffffae60003f:       nop
> 
> Therefore, it didn't show up when --group option is used since the
> leader ("mem-loads") event has no samples.  But now it checks both
> events.
> 
> Before:
>   $ sudo ./perf annotate --stdio -f --group path_put
>   (no output)
> 
> After:
>   $ sudo ./perf annotate --stdio -f --group path_put
>    Percent                 |      Source code & Disassembly of kcore for cpu/mem-loads,ldlat=30/P, cpu/mem-stores/P, dummy:u (0 samples, percent: local period)
>   -------------------------------------------------------------------------------------------------------------------------------------------------------------
>                            : 0                0xffffffffae600020 <path_put>:
>       0.00    0.00    0.00 :   ffffffffae600020:       endbr64
>       0.00    0.00    0.00 :   ffffffffae600024:       nopl    (%rax, %rax)
>       0.00   91.22    0.00 :   ffffffffae600029:       pushq   %rbx
>       0.00    0.00    0.00 :   ffffffffae60002a:       movq    %rdi, %rbx
>       0.00    0.00    0.00 :   ffffffffae60002d:       movq    8(%rdi), %rdi
>       0.00    8.78    0.00 :   ffffffffae600031:       callq   0xffffffffae614aa0
>       0.00    0.00    0.00 :   ffffffffae600036:       movq    (%rbx), %rdi
>       0.00    0.00    0.00 :   ffffffffae600039:       popq    %rbx
>       0.00    0.00    0.00 :   ffffffffae60003a:       jmp     0xffffffffae620670
>       0.00    0.00    0.00 :   ffffffffae60003f:       nop
> 
> Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-annotate.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
> index efcadb7620b8..1bfe41783a7c 100644
> --- a/tools/perf/builtin-annotate.c
> +++ b/tools/perf/builtin-annotate.c
> @@ -632,13 +632,23 @@ static int __cmd_annotate(struct perf_annotate *ann)
>  	evlist__for_each_entry(session->evlist, pos) {
>  		struct hists *hists = evsel__hists(pos);
>  		u32 nr_samples = hists->stats.nr_samples;
> +		struct ui_progress prog;
> +		struct evsel *evsel;
>  
> -		if (nr_samples == 0)
> +		if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
>  			continue;
>  
> -		if (!symbol_conf.event_group || !evsel__is_group_leader(pos))
> +		for_each_group_member(evsel, pos)
> +			nr_samples += evsel__hists(evsel)->stats.nr_samples;
> +
> +		if (nr_samples == 0)
>  			continue;
>  
> +		ui_progress__init(&prog, nr_samples,
> +				  "Sorting group events for output...");
> +		evsel__output_resort(pos, &prog);
> +		ui_progress__finish();
> +
>  		hists__find_annotations(hists, pos, ann);
>  	}
>  
> -- 
> 2.46.0.rc2.264.g509ed76dc8-goog
> 

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-08-09 21:15 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
2024-08-03 21:13 ` [PATCH 2/5] perf annotate: Set notes->src->nr_events early Namhyung Kim
2024-08-03 21:13 ` [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently Namhyung Kim
2024-08-03 21:13 ` [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events Namhyung Kim
2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
2024-08-05 19:22   ` Arnaldo Carvalho de Melo
2024-08-05 20:14     ` Namhyung Kim
2024-08-05 20:23       ` Arnaldo Carvalho de Melo
2024-08-05 20:50         ` Namhyung Kim
2024-08-06 13:12           ` Arnaldo Carvalho de Melo
2024-08-07  6:12             ` Namhyung Kim
2024-08-07  6:15               ` [PATCH] perf annotate: Fix --group behavior when leader has no samples Namhyung Kim
2024-08-09 21:15                 ` Arnaldo Carvalho de Melo
2024-08-05 19:26   ` [PATCH 5/5] perf annotate: Add --skip-empty option Arnaldo Carvalho de Melo
2024-08-05 19:26 ` [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).