linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 5/5] perf annotate: Add --skip-empty option
Date: Tue, 6 Aug 2024 10:12:30 -0300	[thread overview]
Message-ID: <ZrIhPuFee8R9ZvVi@x1> (raw)
In-Reply-To: <ZrE7EWyFJ3hThopM@google.com>

On Mon, Aug 05, 2024 at 01:50:25PM -0700, Namhyung Kim wrote:
> On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > > output.  This is consistent when the option is set in perf report.
> > > > > 
> > > > > For example, the following command would use 3 events including dummy.
> > > > > 
> > > > >   $ perf mem record -a -- perf test -w noploop
> > > > > 
> > > > >   $ perf evlist
> > > > >   cpu/mem-loads,ldlat=30/P
> > > > >   cpu/mem-stores/P
> > > > >   dummy:u
> > > > > 
> > > > > Just using perf annotate with --group will show the all 3 events.
> > > > 
> > > > Seems unrelated, just before compiling with this patch:
> > > > 
> > > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > > Memory events are enabled on a subset of CPUs: 4-11
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > > root@x1:~#
> > > > 
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --stdio2 sched_clock
> > > > Samples: 178  of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > > Percent      0xffffffff810511e0 <sched_clock>:
> > > >                endbr64        
> > > >    5.76        incl    pcpu_hot+0x8
> > > >    5.47      → callq   sched_clock_noinstr
> > > >   88.78        decl    pcpu_hot+0x8
> > > >              ↓ je      1e     
> > > >              → jmp     __x86_return_thunk
> > > >          1e: → callq   __SCT__preempt_schedule_notrace
> > > >              → jmp     __x86_return_thunk
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --group --stdio sched_clock
> > > > root@x1:~# perf annotate --group sched_clock
> > > > root@x1:~#
> > > > 
> > > > root@x1:~# perf evlist
> > > > cpu_atom/mem-loads,ldlat=30/P
> > > > cpu_atom/mem-stores/P
> > > > dummy:u
> > > > root@x1:~#
> > > > 
> > > > root@x1:~# perf report --header-only | grep cmdline
> > > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop 
> > > > root@x1:~#
> > > > 
> > > > I thought it would be some hybrid oddity but seems to be just --group
> > > > related, seems like it stops if the first event has no samples? Because
> > > > it works with another symbol:
> > > 
> > > Good catch.  Yeah I found it only checked the first event.  Something
> > > like below should fix the issue.
> > 
> > Nope, with the patch applied:
> > 
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --stdio sched_clock
> >  Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> > -------------------------------------------------------------------------------------------------------------------
> >          : 0                0xffffffff810511e0 <sched_clock>:
> >     0.00 :   ffffffff810511e0:       endbr64
> >     5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
> >     0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
> >    94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
> >     0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
> >     0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
> >     0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
> >     0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --group --stdio2 sched_clock
> > root@x1:~# perf annotate --group sched_clock
> > root@x1:~#
> 
> Oh ok, it was not enough.  It should call evsel__output_resort() after
> hists__match() and hists__link().  Use this instead.

Ok, this works:

Before this patch:

root@x1:~# perf annotate --stdio sched_clock
 Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------
         : 0                0xffffffff810511e0 <sched_clock>:
    0.00 :   ffffffff810511e0:       endbr64
    5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
    0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
   94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
    0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
    0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
    0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
    0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~#

After:

root@x1:~# perf annotate --group --stdio sched_clock
 Percent                 |      Source code & Disassembly of vmlinux for cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u (0 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         : 0                0xffffffff810511e0 <sched_clock>:
    0.00    0.00    0.00 :   ffffffff810511e0:       endbr64
    0.00    5.11    0.00 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
    0.00    0.13    0.00 :   ffffffff810511eb:       callq   0xffffffff821350d0
    0.00   94.76    0.00 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
    0.00    0.00    0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
    0.00    0.00    0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
    0.00    0.00    0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
    0.00    0.00    0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
root@x1:~#

One example with samples for the first two events:

root@x1:~# perf annotate --group --stdio2
Samples: 2K of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 22892183, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent                       0xffffffff8124e080 <cgroup_rstat_updated>:
   0.00    0.24    0.00         endbr64                         
                              → callq   __fentry__              
   0.00   99.76    0.00         pushq   %r15                    
                                movq    $0x251d4,%rcx           
                                pushq   %r14                    
                                movq    %rdi,%r14               
                                pushq   %r13                    
                                movslq  %esi,%r13               
                                pushq   %r12                    
                                pushq   %rbp                    
                                pushq   %rbx                    
                                subq    $0x10,%rsp              
                                cmpq    $0x2000,%r13            
                              ↓ jae     17f                     
                          31:   movq    0x3d0(%r14),%rbx        
                                movq    -0x7d3fb360(, %r13, 8),%r12
                                cmpq    $0x2000,%r13            
                              ↓ jae     19b                     
  25.00    0.00    0.00   4d:   cmpq    $0,0x88(%r12, %rbx)     
                              ↓ je      6b                      
                                addq    $0x10,%rsp              
                                popq    %rbx                    
                                popq    %rbp                    
                                popq    %r12                    
  75.00    0.00    0.00         popq    %r13                    
                                popq    %r14                    
                                popq    %r15                    
                              → jmp     __x86_return_thunk      
<SNIP>

And then skipping "empty" events:

root@x1:~# perf annotate --group --skip-empty --stdio2 cgroup_rstat_updated | head -35
Samples: 4  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 31851, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent               0xffffffff8124e080 <cgroup_rstat_updated>:
   0.00    0.24         endbr64                 
                      → callq   __fentry__      
   0.00   99.76         pushq   %r15            
                        movq    $0x251d4,%rcx   
                        pushq   %r14            
                        movq    %rdi,%r14       
                        pushq   %r13            
                        movslq  %esi,%r13       
                        pushq   %r12            
                        pushq   %rbp            
                        pushq   %rbx            
                        subq    $0x10,%rsp      
                        cmpq    $0x2000,%r13    
                      ↓ jae     17f             
                  31:   movq    0x3d0(%r14),%rbx
                        movq    -0x7d3fb360(, %r13, 8),%r12
                        cmpq    $0x2000,%r13    
                      ↓ jae     19b             
  25.00    0.00   4d:   cmpq    $0,0x88(%r12, %rbx)
                      ↓ je      6b              
                        addq    $0x10,%rsp      
                        popq    %rbx            
                        popq    %rbp            
                        popq    %r12            
  75.00    0.00         popq    %r13            
                        popq    %r14            
                        popq    %r15            
                      → jmp     __x86_return_thunk
                  6b:   addq    %r12,%rcx       
                        movq    %rcx,%rdi       
                        movq    %rcx,(%rsp)     
                      → callq   *ffffffff82151500
root@x1:~#

So, I haven't done further analysis but I think this is a separate
issue, right?

Thanks for the fix!

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

- Arnaldo

  reply	other threads:[~2024-08-06 13:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
2024-08-03 21:13 ` [PATCH 2/5] perf annotate: Set notes->src->nr_events early Namhyung Kim
2024-08-03 21:13 ` [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently Namhyung Kim
2024-08-03 21:13 ` [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events Namhyung Kim
2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
2024-08-05 19:22   ` Arnaldo Carvalho de Melo
2024-08-05 20:14     ` Namhyung Kim
2024-08-05 20:23       ` Arnaldo Carvalho de Melo
2024-08-05 20:50         ` Namhyung Kim
2024-08-06 13:12           ` Arnaldo Carvalho de Melo [this message]
2024-08-07  6:12             ` Namhyung Kim
2024-08-07  6:15               ` [PATCH] perf annotate: Fix --group behavior when leader has no samples Namhyung Kim
2024-08-09 21:15                 ` Arnaldo Carvalho de Melo
2024-08-05 19:26   ` [PATCH 5/5] perf annotate: Add --skip-empty option Arnaldo Carvalho de Melo
2024-08-05 19:26 ` [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrIhPuFee8R9ZvVi@x1 \
    --to=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).