From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>,
Kan Liang <kan.liang@linux.intel.com>,
Jiri Olsa <jolsa@kernel.org>,
Adrian Hunter <adrian.hunter@intel.com>,
Peter Zijlstra <peterz@infradead.org>,
Ingo Molnar <mingo@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 5/5] perf annotate: Add --skip-empty option
Date: Tue, 6 Aug 2024 10:12:30 -0300 [thread overview]
Message-ID: <ZrIhPuFee8R9ZvVi@x1> (raw)
In-Reply-To: <ZrE7EWyFJ3hThopM@google.com>
On Mon, Aug 05, 2024 at 01:50:25PM -0700, Namhyung Kim wrote:
> On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > > output. This is consistent when the option is set in perf report.
> > > > >
> > > > > For example, the following command would use 3 events including dummy.
> > > > >
> > > > > $ perf mem record -a -- perf test -w noploop
> > > > >
> > > > > $ perf evlist
> > > > > cpu/mem-loads,ldlat=30/P
> > > > > cpu/mem-stores/P
> > > > > dummy:u
> > > > >
> > > > > Just using perf annotate with --group will show the all 3 events.
> > > >
> > > > Seems unrelated, just before compiling with this patch:
> > > >
> > > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > > Memory events are enabled on a subset of CPUs: 4-11
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > > root@x1:~#
> > > >
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --stdio2 sched_clock
> > > > Samples: 178 of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > > Percent 0xffffffff810511e0 <sched_clock>:
> > > > endbr64
> > > > 5.76 incl pcpu_hot+0x8
> > > > 5.47 → callq sched_clock_noinstr
> > > > 88.78 decl pcpu_hot+0x8
> > > > ↓ je 1e
> > > > → jmp __x86_return_thunk
> > > > 1e: → callq __SCT__preempt_schedule_notrace
> > > > → jmp __x86_return_thunk
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --group --stdio sched_clock
> > > > root@x1:~# perf annotate --group sched_clock
> > > > root@x1:~#
> > > >
> > > > root@x1:~# perf evlist
> > > > cpu_atom/mem-loads,ldlat=30/P
> > > > cpu_atom/mem-stores/P
> > > > dummy:u
> > > > root@x1:~#
> > > >
> > > > root@x1:~# perf report --header-only | grep cmdline
> > > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop
> > > > root@x1:~#
> > > >
> > > > I thought it would be some hybrid oddity but seems to be just --group
> > > > related, seems like it stops if the first event has no samples? Because
> > > > it works with another symbol:
> > >
> > > Good catch. Yeah I found it only checked the first event. Something
> > > like below should fix the issue.
> >
> > Nope, with the patch applied:
> >
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --stdio sched_clock
> > Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> > -------------------------------------------------------------------------------------------------------------------
> > : 0 0xffffffff810511e0 <sched_clock>:
> > 0.00 : ffffffff810511e0: endbr64
> > 5.11 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
> > 0.13 : ffffffff810511eb: callq 0xffffffff821350d0
> > 94.76 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
> > 0.00 : ffffffff810511f7: je 0xffffffff810511fe
> > 0.00 : ffffffff810511f9: jmp 0xffffffff82153320
> > 0.00 : ffffffff810511fe: callq 0xffffffff82153990
> > 0.00 : ffffffff81051203: jmp 0xffffffff82153320
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --group --stdio2 sched_clock
> > root@x1:~# perf annotate --group sched_clock
> > root@x1:~#
>
> Oh ok, it was not enough. It should call evsel__output_resort() after
> hists__match() and hists__link(). Use this instead.
Ok, this works:
Before this patch:
root@x1:~# perf annotate --stdio sched_clock
Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff810511e0 <sched_clock>:
0.00 : ffffffff810511e0: endbr64
5.11 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
0.13 : ffffffff810511eb: callq 0xffffffff821350d0
94.76 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
0.00 : ffffffff810511f7: je 0xffffffff810511fe
0.00 : ffffffff810511f9: jmp 0xffffffff82153320
0.00 : ffffffff810511fe: callq 0xffffffff82153990
0.00 : ffffffff81051203: jmp 0xffffffff82153320
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~#
After:
root@x1:~# perf annotate --group --stdio sched_clock
Percent | Source code & Disassembly of vmlinux for cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u (0 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
: 0 0xffffffff810511e0 <sched_clock>:
0.00 0.00 0.00 : ffffffff810511e0: endbr64
0.00 5.11 0.00 : ffffffff810511e4: incl %gs:0x7efe2d5d(%rip) # 33f48 <pcpu_hot+0x8>
0.00 0.13 0.00 : ffffffff810511eb: callq 0xffffffff821350d0
0.00 94.76 0.00 : ffffffff810511f0: decl %gs:0x7efe2d51(%rip) # 33f48 <pcpu_hot+0x8>
0.00 0.00 0.00 : ffffffff810511f7: je 0xffffffff810511fe
0.00 0.00 0.00 : ffffffff810511f9: jmp 0xffffffff82153320
0.00 0.00 0.00 : ffffffff810511fe: callq 0xffffffff82153990
0.00 0.00 0.00 : ffffffff81051203: jmp 0xffffffff82153320
root@x1:~#
One example with samples for the first two events:
root@x1:~# perf annotate --group --stdio2
Samples: 2K of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 22892183, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent 0xffffffff8124e080 <cgroup_rstat_updated>:
0.00 0.24 0.00 endbr64
→ callq __fentry__
0.00 99.76 0.00 pushq %r15
movq $0x251d4,%rcx
pushq %r14
movq %rdi,%r14
pushq %r13
movslq %esi,%r13
pushq %r12
pushq %rbp
pushq %rbx
subq $0x10,%rsp
cmpq $0x2000,%r13
↓ jae 17f
31: movq 0x3d0(%r14),%rbx
movq -0x7d3fb360(, %r13, 8),%r12
cmpq $0x2000,%r13
↓ jae 19b
25.00 0.00 0.00 4d: cmpq $0,0x88(%r12, %rbx)
↓ je 6b
addq $0x10,%rsp
popq %rbx
popq %rbp
popq %r12
75.00 0.00 0.00 popq %r13
popq %r14
popq %r15
→ jmp __x86_return_thunk
<SNIP>
And then skipping "empty" events:
root@x1:~# perf annotate --group --skip-empty --stdio2 cgroup_rstat_updated | head -35
Samples: 4 of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 31851, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent 0xffffffff8124e080 <cgroup_rstat_updated>:
0.00 0.24 endbr64
→ callq __fentry__
0.00 99.76 pushq %r15
movq $0x251d4,%rcx
pushq %r14
movq %rdi,%r14
pushq %r13
movslq %esi,%r13
pushq %r12
pushq %rbp
pushq %rbx
subq $0x10,%rsp
cmpq $0x2000,%r13
↓ jae 17f
31: movq 0x3d0(%r14),%rbx
movq -0x7d3fb360(, %r13, 8),%r12
cmpq $0x2000,%r13
↓ jae 19b
25.00 0.00 4d: cmpq $0,0x88(%r12, %rbx)
↓ je 6b
addq $0x10,%rsp
popq %rbx
popq %rbp
popq %r12
75.00 0.00 popq %r13
popq %r14
popq %r15
→ jmp __x86_return_thunk
6b: addq %r12,%rcx
movq %rcx,%rdi
movq %rcx,(%rsp)
→ callq *ffffffff82151500
root@x1:~#
So, I haven't done further analysis but I think this is a separate
issue, right?
Thanks for the fix!
Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
- Arnaldo
next prev parent reply other threads:[~2024-08-06 13:12 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
2024-08-03 21:13 ` [PATCH 2/5] perf annotate: Set notes->src->nr_events early Namhyung Kim
2024-08-03 21:13 ` [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently Namhyung Kim
2024-08-03 21:13 ` [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events Namhyung Kim
2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
2024-08-05 19:22 ` Arnaldo Carvalho de Melo
2024-08-05 20:14 ` Namhyung Kim
2024-08-05 20:23 ` Arnaldo Carvalho de Melo
2024-08-05 20:50 ` Namhyung Kim
2024-08-06 13:12 ` Arnaldo Carvalho de Melo [this message]
2024-08-07 6:12 ` Namhyung Kim
2024-08-07 6:15 ` [PATCH] perf annotate: Fix --group behavior when leader has no samples Namhyung Kim
2024-08-09 21:15 ` Arnaldo Carvalho de Melo
2024-08-05 19:26 ` [PATCH 5/5] perf annotate: Add --skip-empty option Arnaldo Carvalho de Melo
2024-08-05 19:26 ` [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Arnaldo Carvalho de Melo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZrIhPuFee8R9ZvVi@x1 \
--to=acme@kernel.org \
--cc=adrian.hunter@intel.com \
--cc=irogers@google.com \
--cc=jolsa@kernel.org \
--cc=kan.liang@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-perf-users@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=namhyung@kernel.org \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.