Linux Perf Users
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Joe Mario <jmario@redhat.com>,
	Stephane Eranian <eranian@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Santosh Shukla <santosh.shukla@amd.com>,
	Ananth Narayan <ananth.narayan@amd.com>,
	Sandipan Das <sandipan.das@amd.com>
Subject: Re: [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes)
Date: Tue, 29 Apr 2025 23:00:30 -0300	[thread overview]
Message-ID: <aBGEPnB5B4NTaOg9@x1> (raw)
In-Reply-To: <20250429035938.1301-1-ravi.bangoria@amd.com>

On Tue, Apr 29, 2025 at 03:59:34AM +0000, Ravi Bangoria wrote:
> IBS on Zen5:
> - Introduced Load Latency filtering capability.
> - Shows DTLB and page size information differently from prior generations.
> 
> Kernel changes for these enhancements are already upstream. So, resending
> tools changes separately.
> 
> Patches are prepared on perf-tools-next/perf-tools-next (85447f68a1e3).
> 
> v3: https://lore.kernel.org/r/20250205060547.1337-1-ravi.bangoria@amd.com
> v3->v4:
> - Remove kernel changes.
> - Improve IBS sample period unit test

Preliminary tests with what is in tmp.perf-tools-next:

root@number:~# perf mem record find / > /dev/null
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.992 MB perf.data (31824 samples) ]
root@number:~# perf mem report -s mem --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 31K of event 'ibs_op//'
# Total weight : 66561
# Sort order   : mem
#
# Overhead       Samples  Memory access                          
# ........  ............  .......................................
#
    36.51%           456  L2 hit                                 
    30.26%         20141  N/A                                    
    16.75%         11149  L1 hit                                 
    10.08%            18  RAM hit                                
     6.39%            52  L3 hit                                 
     0.01%             8  LFB/MAB hit                            
#
# (Tip: To collect Processor Trace with samples use perf record -e '{intel_pt//,cycles}' ; perf script --call-trace or --insn-trace --xed -F +ipc (remove --xed if no xed))
#
root@number:~#

root@number:~# perf evlist -v
ibs_op//: type: 11 (ibs_op), size: 136, config: 0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
root@number:~# 

root@number:~# perf report --header-only | head -25
# ========
# captured on    : Tue Apr 29 22:54:04 2025
# header version : 1
# data offset    : 512
# data size      : 668520
# feat offset    : 669032
# hostname : number
# os release : 6.15.0-rc4+
# perf version : 6.15.rc2.g3e8278077117
# arch : x86_64
# nrcpus online : 32
# nrcpus avail : 32
# cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor
# cpuid : AuthenticAMD,26,68,0
# total memory : 31928240 kB
# cmdline : /home/acme/bin/perf mem record find / 
# event : name = ibs_op//, , id = { 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335 }, type = 11 (ibs_op), size = 136, config = 0, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, mmap_data = 1, sample_id_all = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, amd_df = 12, amd_iommu_0 = 15, amd_l3 = 13, amd_umc_0 = 14, breakpoint = 5, hwmon_amdgpu = 4294901761, hwmon_k10temp = 4294901762, hwmon_nvme = 4294901760, hwmon_r8169_0_e00_00 = 4294901763, ibs_fetch = 10, ibs_op = 11, kprobe = 8, msr = 16, power = 17, power_core = 18, software = 1, tool = 4294967294, tracepoint = 2, uprobe = 9
# CACHE info available, use -I to display
# time of first sample : 244.312475
# time of last sample : 246.801803
# sample duration :   2489.328 ms
# MEM_TOPOLOGY info available, use -I to display
root@number:~#

root@number:~# perf report | head
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 9K of event 'ibs_op//'
# Event count (approx.): 12948758501
#
# Overhead  Command  Shared Object              Symbol                                  
# ........  .......  .........................  ........................................
root@number:~# perf report | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 9K of event 'ibs_op//'
# Event count (approx.): 12948758501
#
# Overhead  Command  Shared Object              Symbol                                  
# ........  .......  .........................  ........................................
#
     6.11%  find     [kernel.kallsyms]          [k] btrfs_bin_search
     4.91%  find     [kernel.kallsyms]          [k] filldir64
     4.77%  find     find                       [.] consider_visiting
     3.95%  find     [kernel.kallsyms]          [k] memcpy
     2.76%  find     [kernel.kallsyms]          [k] entry_SYSCALL_64
     2.59%  find     libc.so.6                  [.] __printf_buffer
     2.52%  find     [kernel.kallsyms]          [k] btrfs_getattr
     2.09%  find     [kernel.kallsyms]          [k] pid_delete_dentry
     1.88%  find     libc.so.6                  [.] msort_with_tmp.part.0
root@number:~#

root@number:~# perf annotate -v --stdio2 btrfs_bin_search
build id event received for [vdso]: 6dc5707510cc7434be3d6cb4dc6bae12881efda3 [20]
build id event received for /usr/bin/find: 3804e1e1214a39a975e093a79ec04961743ef5c5 [20]
build id event received for /usr/lib64/libc.so.6: 2b3c02fe7e4d3811767175b6f323692a10a4e116 [20]
build id event received for [kernel.kallsyms]: d391f0e79126801bc8a8f907e763de7979941712 [20]
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/6.15.0-rc4+/build/vmlinux for symbols
read_gnu_debugdata: using .gnu_debugdata of /usr/bin/find
symbol__disassemble: filename=/lib/modules/6.15.0-rc4+/build/vmlinux, sym=btrfs_bin_search, start=0xffffffffac97e890, end=0xffffffffac97ead9
annotating [0x2e87fbf0] /lib/modules/6.15.0-rc4+/build/vmlinux : [0x2fa7f070]               btrfs_bin_search
Disassembled with llvm
Samples: 585  of event 'ibs_op//', 4000 Hz, Event count (approx.): 790819874, [percent: local period]
btrfs_bin_search() /lib/modules/6.15.0-rc4+/build/vmlinux
Percent       0xffffffff8197e890 <btrfs_bin_search>:
   0.17         endbr64         
              → callq   __fentry__
   0.16         pushq   %r15    
   0.18         movq    %rdx,%r15
                pushq   %r14    
                pushq   %r13    
   0.18         pushq   %r12    
   0.34         pushq   %rbp    
                movl    %esi,%ebp
                pushq   %rbx    
   0.35         movq    %rdi,%rbx
                subq    $0x48,%rsp
                movq    (%rdi),%r9
                movq    %rcx,(%rsp)
   0.34         movq    %r9,%rdx
                andl    $0xfff,%edx
                movq    __stack_chk_guard,%r14
   0.18         movq    %r14,0x40(%rsp)
   0.33         movl    %esi,%r14d
   0.17         movq    0x70(%rdi),%rsi
                movq    %rsi,%rax
                subq    vmemmap_base,%rax
                sarq    $0x6, %rax
   0.17         shlq    $0xc, %rax
   0.15         addq    page_offset_base,%rax
   0.17         addq    %rdx,%rax
                movl    0x60(%rax),%r13d
   0.17         cmpl    %ebp,%r13d
              → jb      btrfs_bin_search.cold
                cmpb    $0x1,0x64(%rax)
                sbbl    %r12d,%r12d
                andl    $-0x8,%r12d
   0.18         addl    $0x21,%r12d
                cmpl    %r13d,%r14d
              ↓ jae     20f     
   1.04   84:   leal    (%r14,%r13),%ebp
   0.81         movb    $0x0,0x3f(%rsp)
   1.20         movslq  0xc(%rbx),%r10
   0.66         movl    $0xfff,%r11d
   1.06         movq    $0x0,0x2f(%rsp)
   0.85         shrl    %ebp    
   1.35         movq    $0x0,0x37(%rsp)
   1.93         movl    %ebp,%eax
   1.04         movq    (%rsi),%rdx
   2.20         imull   %r12d,%eax
  10.77         cltq            
  10.26         addq    $0x65,%rax
   3.11         addq    %rax,%r9
   0.68         andl    $0x40,%edx
              ↓ je      e3      
                movq    0x40(%rsi),%rsi
                movl    $0x1000,%r11d
                movzbl  %sil,%ecx
                shlq    %cl, %r11
                subq    $0x1,%r11
root@number:~# 


I'll do more tests tomorrow and try some of the workloads that Joe uses.



Thanks a lot!

- Arnaldo

  parent reply	other threads:[~2025-04-30  2:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-29  3:59 [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Ravi Bangoria
2025-04-29  3:59 ` [PATCH v4 1/4] perf amd ibs: Add Load Latency bits in raw dump Ravi Bangoria
2025-04-30 16:58   ` Namhyung Kim
2025-04-30 17:45     ` Ravi Bangoria
2025-04-29  3:59 ` [PATCH v4 2/4] perf amd ibs: Incorporate Zen5 DTLB and PageSize information Ravi Bangoria
2025-04-29  3:59 ` [PATCH v4 3/4] perf mem/c2c amd: Add ldlat support Ravi Bangoria
2025-04-29  3:59 ` [PATCH v4 4/4] perf test amd ibs: Add sample period unit test Ravi Bangoria
2025-04-29 20:55   ` Arnaldo Carvalho de Melo
2025-04-30  1:13     ` Arnaldo Carvalho de Melo
2025-04-30  1:22       ` Arnaldo Carvalho de Melo
2025-04-30  9:02         ` Ravi Bangoria
2025-04-30 13:06           ` Arnaldo Carvalho de Melo
2025-04-30 13:31             ` Arnaldo Carvalho de Melo
2025-04-30 16:07               ` Ravi Bangoria
2025-04-30 23:39                 ` Arnaldo Carvalho de Melo
2025-04-30  6:36       ` Ravi Bangoria
2025-04-30  6:33     ` Ravi Bangoria
2025-04-30  2:00 ` Arnaldo Carvalho de Melo [this message]
2025-05-13  8:32   ` [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Ravi Bangoria

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBGEPnB5B4NTaOg9@x1 \
    --to=acme@kernel.org \
    --cc=ananth.narayan@amd.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jmario@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=sandipan.das@amd.com \
    --cc=santosh.shukla@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox