All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Namhyung Kim <namhyung@kernel.org>
Cc: Ian Rogers <irogers@google.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	Jiri Olsa <jolsa@kernel.org>,
	Adrian Hunter <adrian.hunter@intel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-perf-users@vger.kernel.org
Subject: Re: [PATCH 5/5] perf annotate: Add --skip-empty option
Date: Tue, 6 Aug 2024 10:12:30 -0300	[thread overview]
Message-ID: <ZrIhPuFee8R9ZvVi@x1> (raw)
In-Reply-To: <ZrE7EWyFJ3hThopM@google.com>

On Mon, Aug 05, 2024 at 01:50:25PM -0700, Namhyung Kim wrote:
> On Mon, Aug 05, 2024 at 05:23:51PM -0300, Arnaldo Carvalho de Melo wrote:
> > On Mon, Aug 05, 2024 at 01:14:27PM -0700, Namhyung Kim wrote:
> > > On Mon, Aug 05, 2024 at 04:22:12PM -0300, Arnaldo Carvalho de Melo wrote:
> > > > On Sat, Aug 03, 2024 at 02:13:32PM -0700, Namhyung Kim wrote:
> > > > > Like in perf report, we want to hide empty events in the perf annotate
> > > > > output.  This is consistent when the option is set in perf report.
> > > > > 
> > > > > For example, the following command would use 3 events including dummy.
> > > > > 
> > > > >   $ perf mem record -a -- perf test -w noploop
> > > > > 
> > > > >   $ perf evlist
> > > > >   cpu/mem-loads,ldlat=30/P
> > > > >   cpu/mem-stores/P
> > > > >   dummy:u
> > > > > 
> > > > > Just using perf annotate with --group will show the all 3 events.
> > > > 
> > > > Seems unrelated, just before compiling with this patch:
> > > > 
> > > > root@x1:~# perf mem record -a -- perf test -w noploop
> > > > Memory events are enabled on a subset of CPUs: 4-11
> > > > [ perf record: Woken up 1 times to write data ]
> > > > [ perf record: Captured and wrote 10.506 MB perf.data (2775 samples) ]
> > > > root@x1:~#
> > > > 
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --stdio2 sched_clock
> > > > Samples: 178  of event 'cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 565268, [percent: local period]
> > > > sched_clock() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
> > > > Percent      0xffffffff810511e0 <sched_clock>:
> > > >                endbr64        
> > > >    5.76        incl    pcpu_hot+0x8
> > > >    5.47      → callq   sched_clock_noinstr
> > > >   88.78        decl    pcpu_hot+0x8
> > > >              ↓ je      1e     
> > > >              → jmp     __x86_return_thunk
> > > >          1e: → callq   __SCT__preempt_schedule_notrace
> > > >              → jmp     __x86_return_thunk
> > > > root@x1:~# perf annotate --group --stdio2 sched_clock
> > > > root@x1:~# perf annotate --group --stdio sched_clock
> > > > root@x1:~# perf annotate --group sched_clock
> > > > root@x1:~#
> > > > 
> > > > root@x1:~# perf evlist
> > > > cpu_atom/mem-loads,ldlat=30/P
> > > > cpu_atom/mem-stores/P
> > > > dummy:u
> > > > root@x1:~#
> > > > 
> > > > root@x1:~# perf report --header-only | grep cmdline
> > > > # cmdline : /home/acme/bin/perf mem record -a -- perf test -w noploop 
> > > > root@x1:~#
> > > > 
> > > > I thought it would be some hybrid oddity but seems to be just --group
> > > > related, seems like it stops if the first event has no samples? Because
> > > > it works with another symbol:
> > > 
> > > Good catch.  Yeah I found it only checked the first event.  Something
> > > like below should fix the issue.
> > 
> > Nope, with the patch applied:
> > 
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --stdio sched_clock
> >  Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
> > -------------------------------------------------------------------------------------------------------------------
> >          : 0                0xffffffff810511e0 <sched_clock>:
> >     0.00 :   ffffffff810511e0:       endbr64
> >     5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
> >     0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
> >    94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
> >     0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
> >     0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
> >     0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
> >     0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
> > root@x1:~# perf annotate --group --stdio sched_clock
> > root@x1:~# perf annotate --group --stdio2 sched_clock
> > root@x1:~# perf annotate --group sched_clock
> > root@x1:~#
> 
> Oh ok, it was not enough.  It should call evsel__output_resort() after
> hists__match() and hists__link().  Use this instead.

Ok, this works:

Before this patch:

root@x1:~# perf annotate --stdio sched_clock
 Percent |      Source code & Disassembly of vmlinux for cpu_atom/mem-stores/P (147 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------
         : 0                0xffffffff810511e0 <sched_clock>:
    0.00 :   ffffffff810511e0:       endbr64
    5.11 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
    0.13 :   ffffffff810511eb:       callq   0xffffffff821350d0
   94.76 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
    0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
    0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
    0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
    0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
root@x1:~# perf annotate --group --stdio sched_clock
root@x1:~#

After:

root@x1:~# perf annotate --group --stdio sched_clock
 Percent                 |      Source code & Disassembly of vmlinux for cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u (0 samples, percent: local period)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                         : 0                0xffffffff810511e0 <sched_clock>:
    0.00    0.00    0.00 :   ffffffff810511e0:       endbr64
    0.00    5.11    0.00 :   ffffffff810511e4:       incl    %gs:0x7efe2d5d(%rip)       # 33f48 <pcpu_hot+0x8>
    0.00    0.13    0.00 :   ffffffff810511eb:       callq   0xffffffff821350d0
    0.00   94.76    0.00 :   ffffffff810511f0:       decl    %gs:0x7efe2d51(%rip)       # 33f48 <pcpu_hot+0x8>
    0.00    0.00    0.00 :   ffffffff810511f7:       je      0xffffffff810511fe
    0.00    0.00    0.00 :   ffffffff810511f9:       jmp     0xffffffff82153320
    0.00    0.00    0.00 :   ffffffff810511fe:       callq   0xffffffff82153990
    0.00    0.00    0.00 :   ffffffff81051203:       jmp     0xffffffff82153320
root@x1:~#

One example with samples for the first two events:

root@x1:~# perf annotate --group --stdio2
Samples: 2K of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P, dummy:u', 4000 Hz, Event count (approx.): 22892183, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent                       0xffffffff8124e080 <cgroup_rstat_updated>:
   0.00    0.24    0.00         endbr64                         
                              → callq   __fentry__              
   0.00   99.76    0.00         pushq   %r15                    
                                movq    $0x251d4,%rcx           
                                pushq   %r14                    
                                movq    %rdi,%r14               
                                pushq   %r13                    
                                movslq  %esi,%r13               
                                pushq   %r12                    
                                pushq   %rbp                    
                                pushq   %rbx                    
                                subq    $0x10,%rsp              
                                cmpq    $0x2000,%r13            
                              ↓ jae     17f                     
                          31:   movq    0x3d0(%r14),%rbx        
                                movq    -0x7d3fb360(, %r13, 8),%r12
                                cmpq    $0x2000,%r13            
                              ↓ jae     19b                     
  25.00    0.00    0.00   4d:   cmpq    $0,0x88(%r12, %rbx)     
                              ↓ je      6b                      
                                addq    $0x10,%rsp              
                                popq    %rbx                    
                                popq    %rbp                    
                                popq    %r12                    
  75.00    0.00    0.00         popq    %r13                    
                                popq    %r14                    
                                popq    %r15                    
                              → jmp     __x86_return_thunk      
<SNIP>

And then skipping "empty" events:

root@x1:~# perf annotate --group --skip-empty --stdio2 cgroup_rstat_updated | head -35
Samples: 4  of events 'cpu_atom/mem-loads,ldlat=30/P, cpu_atom/mem-stores/P', 4000 Hz, Event count (approx.): 31851, [percent: local period]
cgroup_rstat_updated() /usr/lib/debug/lib/modules/6.8.11-200.fc39.x86_64/vmlinux
Percent               0xffffffff8124e080 <cgroup_rstat_updated>:
   0.00    0.24         endbr64                 
                      → callq   __fentry__      
   0.00   99.76         pushq   %r15            
                        movq    $0x251d4,%rcx   
                        pushq   %r14            
                        movq    %rdi,%r14       
                        pushq   %r13            
                        movslq  %esi,%r13       
                        pushq   %r12            
                        pushq   %rbp            
                        pushq   %rbx            
                        subq    $0x10,%rsp      
                        cmpq    $0x2000,%r13    
                      ↓ jae     17f             
                  31:   movq    0x3d0(%r14),%rbx
                        movq    -0x7d3fb360(, %r13, 8),%r12
                        cmpq    $0x2000,%r13    
                      ↓ jae     19b             
  25.00    0.00   4d:   cmpq    $0,0x88(%r12, %rbx)
                      ↓ je      6b              
                        addq    $0x10,%rsp      
                        popq    %rbx            
                        popq    %rbp            
                        popq    %r12            
  75.00    0.00         popq    %r13            
                        popq    %r14            
                        popq    %r15            
                      → jmp     __x86_return_thunk
                  6b:   addq    %r12,%rcx       
                        movq    %rcx,%rdi       
                        movq    %rcx,(%rsp)     
                      → callq   *ffffffff82151500
root@x1:~#

So, I haven't done further analysis but I think this is a separate
issue, right?

Thanks for the fix!

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

- Arnaldo

  reply	other threads:[~2024-08-06 13:12 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-03 21:13 [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Namhyung Kim
2024-08-03 21:13 ` [PATCH 1/5] perf annotate: Use al->data_nr if possible Namhyung Kim
2024-08-03 21:13 ` [PATCH 2/5] perf annotate: Set notes->src->nr_events early Namhyung Kim
2024-08-03 21:13 ` [PATCH 3/5] perf annotate: Use annotation__pcnt_width() consistently Namhyung Kim
2024-08-03 21:13 ` [PATCH 4/5] perf annotate: Set al->data_nr using the notes->src->nr_events Namhyung Kim
2024-08-03 21:13 ` [PATCH 5/5] perf annotate: Add --skip-empty option Namhyung Kim
2024-08-05 19:22   ` Arnaldo Carvalho de Melo
2024-08-05 20:14     ` Namhyung Kim
2024-08-05 20:23       ` Arnaldo Carvalho de Melo
2024-08-05 20:50         ` Namhyung Kim
2024-08-06 13:12           ` Arnaldo Carvalho de Melo [this message]
2024-08-07  6:12             ` Namhyung Kim
2024-08-07  6:15               ` [PATCH] perf annotate: Fix --group behavior when leader has no samples Namhyung Kim
2024-08-09 21:15                 ` Arnaldo Carvalho de Melo
2024-08-05 19:26   ` [PATCH 5/5] perf annotate: Add --skip-empty option Arnaldo Carvalho de Melo
2024-08-05 19:26 ` [PATCHSET 0/5] perf annotate: Add --skip-empty option (v1) Arnaldo Carvalho de Melo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZrIhPuFee8R9ZvVi@x1 \
    --to=acme@kernel.org \
    --cc=adrian.hunter@intel.com \
    --cc=irogers@google.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.