All of lore.kernel.org
 help / color / mirror / Atom feed
From: Arnaldo Carvalho de Melo <acme@kernel.org>
To: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>,
	Namhyung Kim <namhyung@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Joe Mario <jmario@redhat.com>,
	Stephane Eranian <eranian@google.com>,
	Jiri Olsa <jolsa@kernel.org>, Ian Rogers <irogers@google.com>,
	Kan Liang <kan.liang@linux.intel.com>,
	linux-kernel@vger.kernel.org, linux-perf-users@vger.kernel.org,
	Santosh Shukla <santosh.shukla@amd.com>,
	Ananth Narayan <ananth.narayan@amd.com>,
	Sandipan Das <sandipan.das@amd.com>
Subject: Re: [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes)
Date: Tue, 29 Apr 2025 23:00:30 -0300	[thread overview]
Message-ID: <aBGEPnB5B4NTaOg9@x1> (raw)
In-Reply-To: <20250429035938.1301-1-ravi.bangoria@amd.com>

On Tue, Apr 29, 2025 at 03:59:34AM +0000, Ravi Bangoria wrote:
> IBS on Zen5:
> - Introduced Load Latency filtering capability.
> - Shows DTLB and page size information differently from prior generations.
> 
> Kernel changes for these enhancements are already upstream. So, resending
> tools changes separately.
> 
> Patches are prepared on perf-tools-next/perf-tools-next (85447f68a1e3).
> 
> v3: https://lore.kernel.org/r/20250205060547.1337-1-ravi.bangoria@amd.com
> v3->v4:
> - Remove kernel changes.
> - Improve IBS sample period unit test

Preliminary tests with what is in tmp.perf-tools-next:

root@number:~# perf mem record find / > /dev/null
[ perf record: Woken up 5 times to write data ]
[ perf record: Captured and wrote 1.992 MB perf.data (31824 samples) ]
root@number:~# perf mem report -s mem --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 31K of event 'ibs_op//'
# Total weight : 66561
# Sort order   : mem
#
# Overhead       Samples  Memory access                          
# ........  ............  .......................................
#
    36.51%           456  L2 hit                                 
    30.26%         20141  N/A                                    
    16.75%         11149  L1 hit                                 
    10.08%            18  RAM hit                                
     6.39%            52  L3 hit                                 
     0.01%             8  LFB/MAB hit                            
#
# (Tip: To collect Processor Trace with samples use perf record -e '{intel_pt//,cycles}' ; perf script --call-trace or --insn-trace --xed -F +ipc (remove --xed if no xed))
#
root@number:~#

root@number:~# perf evlist -v
ibs_op//: type: 11 (ibs_op), size: 136, config: 0, { sample_period, sample_freq }: 4000, sample_type: IP|TID|TIME|ADDR|PERIOD|DATA_SRC|WEIGHT_STRUCT, read_format: ID|LOST, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq: 1, enable_on_exec: 1, task: 1, mmap_data: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1, ksymbol: 1, bpf_event: 1
root@number:~# 

root@number:~# perf report --header-only | head -25
# ========
# captured on    : Tue Apr 29 22:54:04 2025
# header version : 1
# data offset    : 512
# data size      : 668520
# feat offset    : 669032
# hostname : number
# os release : 6.15.0-rc4+
# perf version : 6.15.rc2.g3e8278077117
# arch : x86_64
# nrcpus online : 32
# nrcpus avail : 32
# cpudesc : AMD Ryzen 9 9950X3D 16-Core Processor
# cpuid : AuthenticAMD,26,68,0
# total memory : 31928240 kB
# cmdline : /home/acme/bin/perf mem record find / 
# event : name = ibs_op//, , id = { 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335 }, type = 11 (ibs_op), size = 136, config = 0, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|ADDR|PERIOD|DATA_SRC|WEIGHT_STRUCT, read_format = ID|LOST, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, enable_on_exec = 1, task = 1, mmap_data = 1, sample_id_all = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# CPU_TOPOLOGY info available, use -I to display
# NUMA_TOPOLOGY info available, use -I to display
# pmu mappings: cpu = 4, amd_df = 12, amd_iommu_0 = 15, amd_l3 = 13, amd_umc_0 = 14, breakpoint = 5, hwmon_amdgpu = 4294901761, hwmon_k10temp = 4294901762, hwmon_nvme = 4294901760, hwmon_r8169_0_e00_00 = 4294901763, ibs_fetch = 10, ibs_op = 11, kprobe = 8, msr = 16, power = 17, power_core = 18, software = 1, tool = 4294967294, tracepoint = 2, uprobe = 9
# CACHE info available, use -I to display
# time of first sample : 244.312475
# time of last sample : 246.801803
# sample duration :   2489.328 ms
# MEM_TOPOLOGY info available, use -I to display
root@number:~#

root@number:~# perf report | head
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 9K of event 'ibs_op//'
# Event count (approx.): 12948758501
#
# Overhead  Command  Shared Object              Symbol                                  
# ........  .......  .........................  ........................................
root@number:~# perf report | head -20
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 9K of event 'ibs_op//'
# Event count (approx.): 12948758501
#
# Overhead  Command  Shared Object              Symbol                                  
# ........  .......  .........................  ........................................
#
     6.11%  find     [kernel.kallsyms]          [k] btrfs_bin_search
     4.91%  find     [kernel.kallsyms]          [k] filldir64
     4.77%  find     find                       [.] consider_visiting
     3.95%  find     [kernel.kallsyms]          [k] memcpy
     2.76%  find     [kernel.kallsyms]          [k] entry_SYSCALL_64
     2.59%  find     libc.so.6                  [.] __printf_buffer
     2.52%  find     [kernel.kallsyms]          [k] btrfs_getattr
     2.09%  find     [kernel.kallsyms]          [k] pid_delete_dentry
     1.88%  find     libc.so.6                  [.] msort_with_tmp.part.0
root@number:~#

root@number:~# perf annotate -v --stdio2 btrfs_bin_search
build id event received for [vdso]: 6dc5707510cc7434be3d6cb4dc6bae12881efda3 [20]
build id event received for /usr/bin/find: 3804e1e1214a39a975e093a79ec04961743ef5c5 [20]
build id event received for /usr/lib64/libc.so.6: 2b3c02fe7e4d3811767175b6f323692a10a4e116 [20]
build id event received for [kernel.kallsyms]: d391f0e79126801bc8a8f907e763de7979941712 [20]
Looking at the vmlinux_path (8 entries long)
Using /lib/modules/6.15.0-rc4+/build/vmlinux for symbols
read_gnu_debugdata: using .gnu_debugdata of /usr/bin/find
symbol__disassemble: filename=/lib/modules/6.15.0-rc4+/build/vmlinux, sym=btrfs_bin_search, start=0xffffffffac97e890, end=0xffffffffac97ead9
annotating [0x2e87fbf0] /lib/modules/6.15.0-rc4+/build/vmlinux : [0x2fa7f070]               btrfs_bin_search
Disassembled with llvm
Samples: 585  of event 'ibs_op//', 4000 Hz, Event count (approx.): 790819874, [percent: local period]
btrfs_bin_search() /lib/modules/6.15.0-rc4+/build/vmlinux
Percent       0xffffffff8197e890 <btrfs_bin_search>:
   0.17         endbr64         
              → callq   __fentry__
   0.16         pushq   %r15    
   0.18         movq    %rdx,%r15
                pushq   %r14    
                pushq   %r13    
   0.18         pushq   %r12    
   0.34         pushq   %rbp    
                movl    %esi,%ebp
                pushq   %rbx    
   0.35         movq    %rdi,%rbx
                subq    $0x48,%rsp
                movq    (%rdi),%r9
                movq    %rcx,(%rsp)
   0.34         movq    %r9,%rdx
                andl    $0xfff,%edx
                movq    __stack_chk_guard,%r14
   0.18         movq    %r14,0x40(%rsp)
   0.33         movl    %esi,%r14d
   0.17         movq    0x70(%rdi),%rsi
                movq    %rsi,%rax
                subq    vmemmap_base,%rax
                sarq    $0x6, %rax
   0.17         shlq    $0xc, %rax
   0.15         addq    page_offset_base,%rax
   0.17         addq    %rdx,%rax
                movl    0x60(%rax),%r13d
   0.17         cmpl    %ebp,%r13d
              → jb      btrfs_bin_search.cold
                cmpb    $0x1,0x64(%rax)
                sbbl    %r12d,%r12d
                andl    $-0x8,%r12d
   0.18         addl    $0x21,%r12d
                cmpl    %r13d,%r14d
              ↓ jae     20f     
   1.04   84:   leal    (%r14,%r13),%ebp
   0.81         movb    $0x0,0x3f(%rsp)
   1.20         movslq  0xc(%rbx),%r10
   0.66         movl    $0xfff,%r11d
   1.06         movq    $0x0,0x2f(%rsp)
   0.85         shrl    %ebp    
   1.35         movq    $0x0,0x37(%rsp)
   1.93         movl    %ebp,%eax
   1.04         movq    (%rsi),%rdx
   2.20         imull   %r12d,%eax
  10.77         cltq            
  10.26         addq    $0x65,%rax
   3.11         addq    %rax,%r9
   0.68         andl    $0x40,%edx
              ↓ je      e3      
                movq    0x40(%rsi),%rsi
                movl    $0x1000,%r11d
                movzbl  %sil,%ecx
                shlq    %cl, %r11
                subq    $0x1,%r11
root@number:~# 


I'll do more tests tomorrow and try some of the workloads that Joe uses.



Thanks a lot!

- Arnaldo

  parent reply	other threads:[~2025-04-30  2:00 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-29  3:59 [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Ravi Bangoria
2025-04-29  3:59 ` [PATCH v4 1/4] perf amd ibs: Add Load Latency bits in raw dump Ravi Bangoria
2025-04-30 16:58   ` Namhyung Kim
2025-04-30 17:45     ` Ravi Bangoria
2025-04-29  3:59 ` [PATCH v4 2/4] perf amd ibs: Incorporate Zen5 DTLB and PageSize information Ravi Bangoria
2025-04-29  3:59 ` [PATCH v4 3/4] perf mem/c2c amd: Add ldlat support Ravi Bangoria
2025-04-29  3:59 ` [PATCH v4 4/4] perf test amd ibs: Add sample period unit test Ravi Bangoria
2025-04-29 20:55   ` Arnaldo Carvalho de Melo
2025-04-30  1:13     ` Arnaldo Carvalho de Melo
2025-04-30  1:22       ` Arnaldo Carvalho de Melo
2025-04-30  9:02         ` Ravi Bangoria
2025-04-30 13:06           ` Arnaldo Carvalho de Melo
2025-04-30 13:31             ` Arnaldo Carvalho de Melo
2025-04-30 16:07               ` Ravi Bangoria
2025-04-30 23:39                 ` Arnaldo Carvalho de Melo
2025-04-30  6:36       ` Ravi Bangoria
2025-04-30  6:33     ` Ravi Bangoria
2025-04-30  2:00 ` Arnaldo Carvalho de Melo [this message]
2025-05-13  8:32   ` [PATCH v4 0/4] perf/amd/ibs: Add Zen5 support (tools changes) Ravi Bangoria

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aBGEPnB5B4NTaOg9@x1 \
    --to=acme@kernel.org \
    --cc=ananth.narayan@amd.com \
    --cc=eranian@google.com \
    --cc=irogers@google.com \
    --cc=jmario@redhat.com \
    --cc=jolsa@kernel.org \
    --cc=kan.liang@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=ravi.bangoria@amd.com \
    --cc=sandipan.das@amd.com \
    --cc=santosh.shukla@amd.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.