* [linux-next] perf report runtime extremely long @ 2025-10-29 8:03 Thomas Richter 2025-10-29 15:13 ` Ian Rogers 2025-10-29 15:56 ` Thomas Richter 0 siblings, 2 replies; 14+ messages in thread From: Thomas Richter @ 2025-10-29 8:03 UTC (permalink / raw) To: linux-perf-use.; +Cc: Jan Polensky, Alexander Gordeev Hi all, has something changed recently in perf report in linux-next repo? I run perf report -i <file> -D | grep something on s390 with 2 different kernels: master 6.18.0rc3: # uname -a Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux # ll /tmp/perf2-eSso2q-2 -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2 # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null real 0m22.262s user 0m22.068s sys 0m0.200s # This runtime is ok for the perf.data file size. linux-next: # uname -a Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux #ll /tmp/perf2-eSso2q-2 -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2 # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null real 2m53.601s user 2m52.938s sys 0m0.575s # Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer for the same input! Has anybody similar observations? Thanks a lot -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-29 8:03 [linux-next] perf report runtime extremely long Thomas Richter @ 2025-10-29 15:13 ` Ian Rogers 2025-10-29 15:52 ` Thomas Richter 2025-10-29 15:56 ` Thomas Richter 1 sibling, 1 reply; 14+ messages in thread From: Ian Rogers @ 2025-10-29 15:13 UTC (permalink / raw) To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On Wed, Oct 29, 2025 at 1:04 AM Thomas Richter <tmricht@linux.ibm.com> wrote: > > Hi all, > > has something changed recently in perf report in linux-next repo? > > I run perf report -i <file> -D | grep something > on s390 with 2 different kernels: > > master 6.18.0rc3: > # uname -a > Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux > # ll /tmp/perf2-eSso2q-2 > -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2 > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null > > real 0m22.262s > user 0m22.068s > sys 0m0.200s > # > > This runtime is ok for the perf.data file size. > > linux-next: > # uname -a > Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux > #ll /tmp/perf2-eSso2q-2 > -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2 > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null > > real 2m53.601s > user 2m52.938s > sys 0m0.575s > # > > Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer > for the same input! > > Has anybody similar observations? So there were changes relating to addr2line that could impact this: https://lore.kernel.org/lkml/20251005212212.2892175-1-irogers@google.com/ There was also the switch to always using build IDs: https://lore.kernel.org/lkml/20250724163302.596743-1-irogers@google.com/ Any chance you could do a `perf record`/`perf report` to see where the time is going? Thanks! Ian ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-29 15:13 ` Ian Rogers @ 2025-10-29 15:52 ` Thomas Richter 2025-10-29 16:25 ` Ian Rogers 0 siblings, 1 reply; 14+ messages in thread From: Thomas Richter @ 2025-10-29 15:52 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On 10/29/25 16:13, Ian Rogers wrote: > On Wed, Oct 29, 2025 at 1:04 AM Thomas Richter <tmricht@linux.ibm.com> wrote: >> >> Hi all, >> >> has something changed recently in perf report in linux-next repo? >> >> I run perf report -i <file> -D | grep something >> on s390 with 2 different kernels: >> >> master 6.18.0rc3: >> # uname -a >> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux >> # ll /tmp/perf2-eSso2q-2 >> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2 >> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null >> >> real 0m22.262s >> user 0m22.068s >> sys 0m0.200s >> # >> >> This runtime is ok for the perf.data file size. >> >> linux-next: >> # uname -a >> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux >> #ll /tmp/perf2-eSso2q-2 >> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2 >> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null >> >> real 2m53.601s >> user 2m52.938s >> sys 0m0.575s >> # >> >> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer >> for the same input! >> >> Has anybody similar observations? > > So there were changes relating to addr2line that could impact this: > https://lore.kernel.org/lkml/20251005212212.2892175-1-irogers@google.com/ > > There was also the switch to always using build IDs: > https://lore.kernel.org/lkml/20250724163302.596743-1-irogers@google.com/ > > Any chance you could do a `perf record`/`perf report` to see where the > time is going? > > Thanks! > Ian Ok here is a first try, can do more tests tomorrow morning. The file perf1.data has this size: root in 🌐 s83lp47 in ~ ❯ ll perf1.data -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data root in 🌐 s83lp47 in ~ ❯ time perf record -- perf report -i perf1.data -D > /dev/null ^C[ perf record: Woken up 6 times to write data ] [ perf record: Captured and wrote 2.742 MB perf.data (70481 samples) ] Terminated real 13m9.894s user 13m6.583s sys 0m2.449s root in 🌐 s83lp47 in ~ took 13m9s ❯ perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 70K of event 'cpum_cf/cycles/P' # Event count (approx.): 88101250000 # # Overhead Command Shared Object Symbol # ........ ....... ................. .................................................................................. # 26.67% perf libc.so.6 [.] __printf_buffer 12.43% perf libc.so.6 [.] __GI___strnlen 12.30% perf libc.so.6 [.] __strstr_vx 11.92% perf perf [.] perf_pmu__for_each_event 7.46% perf libc.so.6 [.] __printf_buffer_write 5.48% perf libc.so.6 [.] memcpy 5.13% perf libc.so.6 [.] __GI___strchrnul 3.25% perf libc.so.6 [.] __vfscanf_internal 2.61% perf libc.so.6 [.] strchr 2.39% perf libc.so.6 [.] __vsnprintf_internal 2.19% perf libc.so.6 [.] __printf_buffer_done 2.03% perf perf [.] perf_pmu__is_tool 1.14% perf libc.so.6 [.] __snprintf_chk@@GLIBC_2.4 1.05% perf perf [.] get_counter_name_callback 0.73% perf libc.so.6 [.] __GI_____strtoull_l_internal 0.60% perf libc.so.6 [.] memcmp 0.28% perf libc.so.6 [.] _IO_str_init_static_internal 0.19% perf libc.so.6 [.] strlen Hope this helps Kind regards -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-29 15:52 ` Thomas Richter @ 2025-10-29 16:25 ` Ian Rogers 0 siblings, 0 replies; 14+ messages in thread From: Ian Rogers @ 2025-10-29 16:25 UTC (permalink / raw) To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On Wed, Oct 29, 2025 at 8:52 AM Thomas Richter <tmricht@linux.ibm.com> wrote: > > On 10/29/25 16:13, Ian Rogers wrote: > > On Wed, Oct 29, 2025 at 1:04 AM Thomas Richter <tmricht@linux.ibm.com> wrote: > >> > >> Hi all, > >> > >> has something changed recently in perf report in linux-next repo? > >> > >> I run perf report -i <file> -D | grep something > >> on s390 with 2 different kernels: > >> > >> master 6.18.0rc3: > >> # uname -a > >> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux > >> # ll /tmp/perf2-eSso2q-2 > >> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2 > >> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null > >> > >> real 0m22.262s > >> user 0m22.068s > >> sys 0m0.200s > >> # > >> > >> This runtime is ok for the perf.data file size. > >> > >> linux-next: > >> # uname -a > >> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux > >> #ll /tmp/perf2-eSso2q-2 > >> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2 > >> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null > >> > >> real 2m53.601s > >> user 2m52.938s > >> sys 0m0.575s > >> # > >> > >> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer > >> for the same input! > >> > >> Has anybody similar observations? > > > > So there were changes relating to addr2line that could impact this: > > https://lore.kernel.org/lkml/20251005212212.2892175-1-irogers@google.com/ > > > > There was also the switch to always using build IDs: > > https://lore.kernel.org/lkml/20250724163302.596743-1-irogers@google.com/ > > > > Any chance you could do a `perf record`/`perf report` to see where the > > time is going? > > > > Thanks! > > Ian > > Ok here is a first try, can do more tests tomorrow morning. > The file perf1.data has this size: > > root in 🌐 s83lp47 in ~ > ❯ ll perf1.data > -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data > > root in 🌐 s83lp47 in ~ > ❯ > > time perf record -- perf report -i perf1.data -D > /dev/null > ^C[ perf record: Woken up 6 times to write data ] > [ perf record: Captured and wrote 2.742 MB perf.data (70481 samples) ] > Terminated > > real 13m9.894s > user 13m6.583s > sys 0m2.449s > > root in 🌐 s83lp47 in ~ took 13m9s > ❯ perf report --stdio > # To display the perf.data header info, please use --header/--header-only options. > # > # > # Total Lost Samples: 0 > # > # Samples: 70K of event 'cpum_cf/cycles/P' > # Event count (approx.): 88101250000 > # > # Overhead Command Shared Object Symbol > # ........ ....... ................. .................................................................................. > # > 26.67% perf libc.so.6 [.] __printf_buffer > 12.43% perf libc.so.6 [.] __GI___strnlen > 12.30% perf libc.so.6 [.] __strstr_vx > 11.92% perf perf [.] perf_pmu__for_each_event > 7.46% perf libc.so.6 [.] __printf_buffer_write > 5.48% perf libc.so.6 [.] memcpy > 5.13% perf libc.so.6 [.] __GI___strchrnul > 3.25% perf libc.so.6 [.] __vfscanf_internal > 2.61% perf libc.so.6 [.] strchr > 2.39% perf libc.so.6 [.] __vsnprintf_internal > 2.19% perf libc.so.6 [.] __printf_buffer_done > 2.03% perf perf [.] perf_pmu__is_tool > 1.14% perf libc.so.6 [.] __snprintf_chk@@GLIBC_2.4 > 1.05% perf perf [.] get_counter_name_callback > 0.73% perf libc.so.6 [.] __GI_____strtoull_l_internal > 0.60% perf libc.so.6 [.] memcmp > 0.28% perf libc.so.6 [.] _IO_str_init_static_internal > 0.19% perf libc.so.6 [.] strlen Looks like the time is in s390-sample-raw.c but I don't see recent changes there: https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/log/tools/perf/util/s390-sample-raw.c?h=perf-tools-next I wonder for the scanf if the following would help: ``` diff --git a/tools/perf/util/s390-sample-raw.c b/tools/perf/util/s390-sample-raw.c index a2d6a1f93cce..21edd359a43a 100644 --- a/tools/perf/util/s390-sample-raw.c +++ b/tools/perf/util/s390-sample-raw.c @@ -140,8 +140,9 @@ struct get_counter_name_data { static int get_counter_name_callback(void *vdata, struct pmu_event_info *info) { struct get_counter_name_data *data = vdata; - int rc, event_nr; + int event_nr; const char *event_str; + char *end_ptr; if (info->str == NULL) return 0; @@ -150,8 +151,8 @@ static int get_counter_name_callback(void *vdata, struct pmu_event_info *info) if (!event_str) return 0; - rc = sscanf(event_str, "event=%x", &event_nr); - if (rc == 1 && event_nr == data->wanted) { + event_nr = strtol(event_str + 6, &end_ptr, /*base=*/16); + if ((end_ptr != event_str + 6) && event_nr == data->wanted) { data->result = strdup(info->name); return 1; /* Terminate the search. */ } ``` ah, new email, let me follow up further there. > Hope this helps > > Kind regards > -- > Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany > -- > IBM Deutschland Research & Development GmbH > > Vorsitzender des Aufsichtsrats: Wolfgang Wendt > > Geschäftsführung: David Faller > > Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-29 8:03 [linux-next] perf report runtime extremely long Thomas Richter 2025-10-29 15:13 ` Ian Rogers @ 2025-10-29 15:56 ` Thomas Richter 2025-10-29 16:47 ` Ian Rogers 1 sibling, 1 reply; 14+ messages in thread From: Thomas Richter @ 2025-10-29 15:56 UTC (permalink / raw) To: linux-perf-use.; +Cc: Jan Polensky, Alexander Gordeev On 10/29/25 09:03, Thomas Richter wrote: > Hi all, > > has something changed recently in perf report in linux-next repo? > > I run perf report -i <file> -D | grep something > on s390 with 2 different kernels: > > master 6.18.0rc3: > # uname -a > Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux > # ll /tmp/perf2-eSso2q-2 > -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2 > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null > > real 0m22.262s > user 0m22.068s > sys 0m0.200s > # > > This runtime is ok for the perf.data file size. > > linux-next: > # uname -a > Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux > #ll /tmp/perf2-eSso2q-2 > -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2 > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null > > real 2m53.601s > user 2m52.938s > sys 0m0.575s > # > > Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer > for the same input! > > Has anybody similar observations? > > Thanks a lot > Ian, sorry sent button pushed too early... here is a second try without option -D. This is much better: root in 🌐 s83lp47 in ~ ❯ ll perf1.data -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data root in 🌐 s83lp47 in ~ ❯ time perf record -- perf report -i perf1.data > /dev/null [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ] real 0m1.097s user 0m0.082s sys 0m0.167s root in 🌐 s83lp47 in ~ ❯ perf report --stdio # To display the perf.data header info, please use --header/--header-only options. # # # Total Lost Samples: 0 # # Samples: 246 of event 'cpum_cf/cycles/P' # Event count (approx.): 307500000 # # Overhead Command Shared Object Symbol # ........ ....... ................... .................................................. # 9.35% perf perf [.] reader__read_event 4.47% perf perf [.] perf_env__arch 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x 2.44% perf ld64.so.1 [.] do_lookup_x 2.44% perf perf [.] down_read 2.03% perf perf [.] hists__findnew_entry 2.03% perf perf [.] perf_hpp__is_dynamic_entry 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14 1.63% perf [kernel.kallsyms] [k] format_decode 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2 1.63% perf perf [.] evsel__parse_sample 1.63% perf perf [.] map__put 1.63% perf perf [.] sort__comm_cmp 1.63% perf perf [.] thread__put 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist 1.22% perf ld64.so.1 [.] _dl_name_match_p 1.22% perf perf [.] dump_printf 1.22% perf perf [.] evlist__event2evsel 1.22% perf perf [.] hashmap_find 1.22% perf perf [.] perf_session__deliver_event -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-29 15:56 ` Thomas Richter @ 2025-10-29 16:47 ` Ian Rogers 2025-10-30 6:04 ` Namhyung Kim 2025-10-30 8:59 ` Thomas Richter 0 siblings, 2 replies; 14+ messages in thread From: Ian Rogers @ 2025-10-29 16:47 UTC (permalink / raw) To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote: > > On 10/29/25 09:03, Thomas Richter wrote: > > Hi all, > > > > has something changed recently in perf report in linux-next repo? > > > > I run perf report -i <file> -D | grep something > > on s390 with 2 different kernels: > > > > master 6.18.0rc3: > > # uname -a > > Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux > > # ll /tmp/perf2-eSso2q-2 > > -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2 > > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null > > > > real 0m22.262s > > user 0m22.068s > > sys 0m0.200s > > # > > > > This runtime is ok for the perf.data file size. > > > > linux-next: > > # uname -a > > Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux > > #ll /tmp/perf2-eSso2q-2 > > -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2 > > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null > > > > real 2m53.601s > > user 2m52.938s > > sys 0m0.575s > > # > > > > Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer > > for the same input! > > > > Has anybody similar observations? > > > > Thanks a lot > > > > Ian, > > sorry sent button pushed too early... > > here is a second try without option -D. > > This is much better: > > root in 🌐 s83lp47 in ~ > ❯ ll perf1.data > -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data > > root in 🌐 s83lp47 in ~ > ❯ time perf record -- perf report -i perf1.data > /dev/null > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ] > > real 0m1.097s > user 0m0.082s > sys 0m0.167s > > root in 🌐 s83lp47 in ~ > ❯ perf report --stdio > # To display the perf.data header info, please use --header/--header-only options. > # > # > # Total Lost Samples: 0 > # > # Samples: 246 of event 'cpum_cf/cycles/P' > # Event count (approx.): 307500000 > # > # Overhead Command Shared Object Symbol > # ........ ....... ................... .................................................. > # > 9.35% perf perf [.] reader__read_event > 4.47% perf perf [.] perf_env__arch > 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro > 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x > 2.44% perf ld64.so.1 [.] do_lookup_x > 2.44% perf perf [.] down_read > 2.03% perf perf [.] hists__findnew_entry > 2.03% perf perf [.] perf_hpp__is_dynamic_entry > 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14 > 1.63% perf [kernel.kallsyms] [k] format_decode > 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2 > 1.63% perf perf [.] evsel__parse_sample > 1.63% perf perf [.] map__put > 1.63% perf perf [.] sort__comm_cmp > 1.63% perf perf [.] thread__put > 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes > 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist > 1.22% perf ld64.so.1 [.] _dl_name_match_p > 1.22% perf perf [.] dump_printf > 1.22% perf perf [.] evlist__event2evsel > 1.22% perf perf [.] hashmap_find > 1.22% perf perf [.] perf_session__deliver_event Hmm.. for perf_env__arch we could probably cache better: ``` diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c index f1626d2032cd..2591f71c7c42 100644 --- a/tools/perf/util/env.c +++ b/tools/perf/util/env.c @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch) const char *perf_env__arch(struct perf_env *env) { - char *arch_name; + if (env && env->arch && env->arch_name_normalized) + return env->arch; if (!env || !env->arch) { /* Assume local operation */ static struct utsname uts = { .machine[0] = '\0', }; if (uts.machine[0] == '\0' && uname(&uts) < 0) return NULL; - arch_name = uts.machine; - } else - arch_name = env->arch; - return normalize_arch(arch_name); + if (!env) + return normalize_arch(uts.machine); + + env->arch = strdup(normalize_arch(uts.machine)); + env->arch_name_normalized = true; + } else { + char *normalized = strdup(normalize_arch(env->arch)); + + free(env->arch); + env->arch = normalized; + env->arch_name_normalized = true; + } + return env->arch; } #if defined(HAVE_LIBTRACEEVENT) diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h index 9977b85523a8..da20bc3810bd 100644 --- a/tools/perf/util/env.h +++ b/tools/perf/util/env.h @@ -61,6 +61,7 @@ struct perf_env { char *os_release; char *version; char *arch; + bool arch_name_normalized; int nr_cpus_online; int nr_cpus_avail; char *cpu_desc; ``` or just force the normalize before assignment. For reader__read_event and the perf_env these changes may be relevant: "perf sample: Remove arch notion of sample parsing" https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com "perf env: Remove global perf_env" https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com I wonder if for some reason the env is NULL in your cases, so we go slow path a lot. The histogram stuff, Namhyung may know more. Thanks! Ian > -- > Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany > -- > IBM Deutschland Research & Development GmbH > > Vorsitzender des Aufsichtsrats: Wolfgang Wendt > > Geschäftsführung: David Faller > > Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 > ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-29 16:47 ` Ian Rogers @ 2025-10-30 6:04 ` Namhyung Kim 2025-10-30 8:59 ` Thomas Richter 1 sibling, 0 replies; 14+ messages in thread From: Namhyung Kim @ 2025-10-30 6:04 UTC (permalink / raw) To: Ian Rogers Cc: Thomas Richter, linux-perf-use., Jan Polensky, Alexander Gordeev Hello, On Wed, Oct 29, 2025 at 09:47:00AM -0700, Ian Rogers wrote: > On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote: > > > > On 10/29/25 09:03, Thomas Richter wrote: > > > Hi all, > > > > > > has something changed recently in perf report in linux-next repo? > > > > > > I run perf report -i <file> -D | grep something > > > on s390 with 2 different kernels: > > > > > > master 6.18.0rc3: > > > # uname -a > > > Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux > > > # ll /tmp/perf2-eSso2q-2 > > > -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2 > > > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null > > > > > > real 0m22.262s > > > user 0m22.068s > > > sys 0m0.200s > > > # > > > > > > This runtime is ok for the perf.data file size. > > > > > > linux-next: > > > # uname -a > > > Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux > > > #ll /tmp/perf2-eSso2q-2 > > > -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2 > > > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null > > > > > > real 2m53.601s > > > user 2m52.938s > > > sys 0m0.575s > > > # > > > > > > Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer > > > for the same input! > > > > > > Has anybody similar observations? > > > > > > Thanks a lot > > > > > > > Ian, > > > > sorry sent button pushed too early... > > > > here is a second try without option -D. > > > > This is much better: > > > > root in 🌐 s83lp47 in ~ > > ❯ ll perf1.data > > -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data > > > > root in 🌐 s83lp47 in ~ > > ❯ time perf record -- perf report -i perf1.data > /dev/null > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ] > > > > real 0m1.097s > > user 0m0.082s > > sys 0m0.167s > > > > root in 🌐 s83lp47 in ~ > > ❯ perf report --stdio > > # To display the perf.data header info, please use --header/--header-only options. > > # > > # > > # Total Lost Samples: 0 > > # > > # Samples: 246 of event 'cpum_cf/cycles/P' > > # Event count (approx.): 307500000 > > # > > # Overhead Command Shared Object Symbol > > # ........ ....... ................... .................................................. > > # > > 9.35% perf perf [.] reader__read_event > > 4.47% perf perf [.] perf_env__arch > > 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro > > 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x > > 2.44% perf ld64.so.1 [.] do_lookup_x > > 2.44% perf perf [.] down_read > > 2.03% perf perf [.] hists__findnew_entry > > 2.03% perf perf [.] perf_hpp__is_dynamic_entry > > 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14 > > 1.63% perf [kernel.kallsyms] [k] format_decode > > 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2 > > 1.63% perf perf [.] evsel__parse_sample > > 1.63% perf perf [.] map__put > > 1.63% perf perf [.] sort__comm_cmp > > 1.63% perf perf [.] thread__put > > 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes > > 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist > > 1.22% perf ld64.so.1 [.] _dl_name_match_p > > 1.22% perf perf [.] dump_printf > > 1.22% perf perf [.] evlist__event2evsel > > 1.22% perf perf [.] hashmap_find > > 1.22% perf perf [.] perf_session__deliver_event > > Hmm.. for perf_env__arch we could probably cache better: > ``` > diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c > index f1626d2032cd..2591f71c7c42 100644 > --- a/tools/perf/util/env.c > +++ b/tools/perf/util/env.c > @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch) > > const char *perf_env__arch(struct perf_env *env) > { > - char *arch_name; > + if (env && env->arch && env->arch_name_normalized) > + return env->arch; > > if (!env || !env->arch) { /* Assume local operation */ > static struct utsname uts = { .machine[0] = '\0', }; > if (uts.machine[0] == '\0' && uname(&uts) < 0) > return NULL; > - arch_name = uts.machine; > - } else > - arch_name = env->arch; > > - return normalize_arch(arch_name); > + if (!env) > + return normalize_arch(uts.machine); > + > + env->arch = strdup(normalize_arch(uts.machine)); > + env->arch_name_normalized = true; > + } else { > + char *normalized = strdup(normalize_arch(env->arch)); > + > + free(env->arch); > + env->arch = normalized; > + env->arch_name_normalized = true; > + } > + return env->arch; > } > > #if defined(HAVE_LIBTRACEEVENT) > diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h > index 9977b85523a8..da20bc3810bd 100644 > --- a/tools/perf/util/env.h > +++ b/tools/perf/util/env.h > @@ -61,6 +61,7 @@ struct perf_env { > char *os_release; > char *version; > char *arch; > + bool arch_name_normalized; > int nr_cpus_online; > int nr_cpus_avail; > char *cpu_desc; > ``` > or just force the normalize before assignment. > > For reader__read_event and the perf_env these changes may be relevant: > "perf sample: Remove arch notion of sample parsing" > https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com > "perf env: Remove global perf_env" > https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com > > I wonder if for some reason the env is NULL in your cases, so we go > slow path a lot. > > The histogram stuff, Namhyung may know more. There only 2 entries for the hist code. The hists__findnew_entry() is expected as it's the main routine to aggregate the samples. But the perf_hpp__is_dynamic_entry() is strange. It should be very small and fast. But I guess the actual problem is when you use -D in perf report. And it doesn't show hist related functions IIUC. Thanks, Namhyung ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-29 16:47 ` Ian Rogers 2025-10-30 6:04 ` Namhyung Kim @ 2025-10-30 8:59 ` Thomas Richter 2025-10-30 13:25 ` Thomas Richter 1 sibling, 1 reply; 14+ messages in thread From: Thomas Richter @ 2025-10-30 8:59 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On 10/29/25 17:47, Ian Rogers wrote: > On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote: >> >> On 10/29/25 09:03, Thomas Richter wrote: >>> Hi all, >>> >>> has something changed recently in perf report in linux-next repo? >>> >>> I run perf report -i <file> -D | grep something >>> on s390 with 2 different kernels: >>> >>> master 6.18.0rc3: >>> # uname -a >>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux >>> # ll /tmp/perf2-eSso2q-2 >>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2 >>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null >>> >>> real 0m22.262s >>> user 0m22.068s >>> sys 0m0.200s >>> # >>> >>> This runtime is ok for the perf.data file size. >>> >>> linux-next: >>> # uname -a >>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux >>> #ll /tmp/perf2-eSso2q-2 >>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2 >>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null >>> >>> real 2m53.601s >>> user 2m52.938s >>> sys 0m0.575s >>> # >>> >>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer >>> for the same input! >>> >>> Has anybody similar observations? >>> >>> Thanks a lot >>> >> >> Ian, >> >> sorry sent button pushed too early... >> >> here is a second try without option -D. >> >> This is much better: >> >> root in 🌐 s83lp47 in ~ >> ❯ ll perf1.data >> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data >> >> root in 🌐 s83lp47 in ~ >> ❯ time perf record -- perf report -i perf1.data > /dev/null >> [ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ] >> >> real 0m1.097s >> user 0m0.082s >> sys 0m0.167s >> >> root in 🌐 s83lp47 in ~ >> ❯ perf report --stdio >> # To display the perf.data header info, please use --header/--header-only options. >> # >> # >> # Total Lost Samples: 0 >> # >> # Samples: 246 of event 'cpum_cf/cycles/P' >> # Event count (approx.): 307500000 >> # >> # Overhead Command Shared Object Symbol >> # ........ ....... ................... .................................................. >> # >> 9.35% perf perf [.] reader__read_event >> 4.47% perf perf [.] perf_env__arch >> 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro >> 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x >> 2.44% perf ld64.so.1 [.] do_lookup_x >> 2.44% perf perf [.] down_read >> 2.03% perf perf [.] hists__findnew_entry >> 2.03% perf perf [.] perf_hpp__is_dynamic_entry >> 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14 >> 1.63% perf [kernel.kallsyms] [k] format_decode >> 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2 >> 1.63% perf perf [.] evsel__parse_sample >> 1.63% perf perf [.] map__put >> 1.63% perf perf [.] sort__comm_cmp >> 1.63% perf perf [.] thread__put >> 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes >> 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist >> 1.22% perf ld64.so.1 [.] _dl_name_match_p >> 1.22% perf perf [.] dump_printf >> 1.22% perf perf [.] evlist__event2evsel >> 1.22% perf perf [.] hashmap_find >> 1.22% perf perf [.] perf_session__deliver_event > > Hmm.. for perf_env__arch we could probably cache better: > ``` > diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c > index f1626d2032cd..2591f71c7c42 100644 > --- a/tools/perf/util/env.c > +++ b/tools/perf/util/env.c > @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch) > > const char *perf_env__arch(struct perf_env *env) > { > - char *arch_name; > + if (env && env->arch && env->arch_name_normalized) > + return env->arch; > > if (!env || !env->arch) { /* Assume local operation */ > static struct utsname uts = { .machine[0] = '\0', }; > if (uts.machine[0] == '\0' && uname(&uts) < 0) > return NULL; > - arch_name = uts.machine; > - } else > - arch_name = env->arch; > > - return normalize_arch(arch_name); > + if (!env) > + return normalize_arch(uts.machine); > + > + env->arch = strdup(normalize_arch(uts.machine)); > + env->arch_name_normalized = true; > + } else { > + char *normalized = strdup(normalize_arch(env->arch)); > + > + free(env->arch); > + env->arch = normalized; > + env->arch_name_normalized = true; > + } > + return env->arch; > } > > #if defined(HAVE_LIBTRACEEVENT) > diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h > index 9977b85523a8..da20bc3810bd 100644 > --- a/tools/perf/util/env.h > +++ b/tools/perf/util/env.h > @@ -61,6 +61,7 @@ struct perf_env { > char *os_release; > char *version; > char *arch; > + bool arch_name_normalized; > int nr_cpus_online; > int nr_cpus_avail; > char *cpu_desc; > ``` > or just force the normalize before assignment. > > For reader__read_event and the perf_env these changes may be relevant: > "perf sample: Remove arch notion of sample parsing" > https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com > "perf env: Remove global perf_env" > https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com > > I wonder if for some reason the env is NULL in your cases, so we go > slow path a lot. > > The histogram stuff, Namhyung may know more. > > Thanks! > Ian > Thanks Ian, I tested your patch, but there is no big difference: root in 🌐 s83lp47 in ~ ❯ ll ~/perf.data-test -rw-------. 1 root root 152280778 Oct 30 07:58 /root/perf.data-test here is the output with your patch: root in 🌐 s83lp47 in ~ ❯ time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null real 46m26.350s user 46m17.559s sys 0m7.434s Here is the output from the linux-next perf version: root in 🌐 s83lp47 in ~ took 46m26s ❯ time perf report -D -i ~/perf.data-test > /dev/null real 49m14.364s user 49m4.872s sys 0m8.037s root in 🌐 s83lp47 in ~ took 49m14s When I go back history and use: # pwd /root/mirror-linux-next/tools/perf # git branch * (HEAD detached at v6.10) master ]# ll ~/perf.data-test -rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-test # time ./perf report -D -i ~/perf.data-test > /dev/null real 8m49.089s user 8m47.898s sys 0m0.848s # Thanks for the pointer. Since the issue is with s390 specific code I will do a git bisect and we go from there. -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-30 8:59 ` Thomas Richter @ 2025-10-30 13:25 ` Thomas Richter 2025-10-30 14:27 ` Thomas Richter 0 siblings, 1 reply; 14+ messages in thread From: Thomas Richter @ 2025-10-30 13:25 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On 10/30/25 09:59, Thomas Richter wrote: > On 10/29/25 17:47, Ian Rogers wrote: >> On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote: >>> >>> On 10/29/25 09:03, Thomas Richter wrote: >>>> Hi all, >>>> >>>> has something changed recently in perf report in linux-next repo? >>>> >>>> I run perf report -i <file> -D | grep something >>>> on s390 with 2 different kernels: >>>> >>>> master 6.18.0rc3: >>>> # uname -a >>>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux >>>> # ll /tmp/perf2-eSso2q-2 >>>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2 >>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null >>>> >>>> real 0m22.262s >>>> user 0m22.068s >>>> sys 0m0.200s >>>> # >>>> >>>> This runtime is ok for the perf.data file size. >>>> >>>> linux-next: >>>> # uname -a >>>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux >>>> #ll /tmp/perf2-eSso2q-2 >>>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2 >>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null >>>> >>>> real 2m53.601s >>>> user 2m52.938s >>>> sys 0m0.575s >>>> # >>>> >>>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer >>>> for the same input! >>>> >>>> Has anybody similar observations? >>>> >>>> Thanks a lot >>>> >>> >>> Ian, >>> >>> sorry sent button pushed too early... >>> >>> here is a second try without option -D. >>> >>> This is much better: >>> >>> root in 🌐 s83lp47 in ~ >>> ❯ ll perf1.data >>> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data >>> >>> root in 🌐 s83lp47 in ~ >>> ❯ time perf record -- perf report -i perf1.data > /dev/null >>> [ perf record: Woken up 1 times to write data ] >>> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ] >>> >>> real 0m1.097s >>> user 0m0.082s >>> sys 0m0.167s >>> >>> root in 🌐 s83lp47 in ~ >>> ❯ perf report --stdio >>> # To display the perf.data header info, please use --header/--header-only options. >>> # >>> # >>> # Total Lost Samples: 0 >>> # >>> # Samples: 246 of event 'cpum_cf/cycles/P' >>> # Event count (approx.): 307500000 >>> # >>> # Overhead Command Shared Object Symbol >>> # ........ ....... ................... .................................................. >>> # >>> 9.35% perf perf [.] reader__read_event >>> 4.47% perf perf [.] perf_env__arch >>> 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro >>> 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x >>> 2.44% perf ld64.so.1 [.] do_lookup_x >>> 2.44% perf perf [.] down_read >>> 2.03% perf perf [.] hists__findnew_entry >>> 2.03% perf perf [.] perf_hpp__is_dynamic_entry >>> 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14 >>> 1.63% perf [kernel.kallsyms] [k] format_decode >>> 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2 >>> 1.63% perf perf [.] evsel__parse_sample >>> 1.63% perf perf [.] map__put >>> 1.63% perf perf [.] sort__comm_cmp >>> 1.63% perf perf [.] thread__put >>> 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes >>> 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist >>> 1.22% perf ld64.so.1 [.] _dl_name_match_p >>> 1.22% perf perf [.] dump_printf >>> 1.22% perf perf [.] evlist__event2evsel >>> 1.22% perf perf [.] hashmap_find >>> 1.22% perf perf [.] perf_session__deliver_event >> >> Hmm.. for perf_env__arch we could probably cache better: >> ``` >> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c >> index f1626d2032cd..2591f71c7c42 100644 >> --- a/tools/perf/util/env.c >> +++ b/tools/perf/util/env.c >> @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch) >> >> const char *perf_env__arch(struct perf_env *env) >> { >> - char *arch_name; >> + if (env && env->arch && env->arch_name_normalized) >> + return env->arch; >> >> if (!env || !env->arch) { /* Assume local operation */ >> static struct utsname uts = { .machine[0] = '\0', }; >> if (uts.machine[0] == '\0' && uname(&uts) < 0) >> return NULL; >> - arch_name = uts.machine; >> - } else >> - arch_name = env->arch; >> >> - return normalize_arch(arch_name); >> + if (!env) >> + return normalize_arch(uts.machine); >> + >> + env->arch = strdup(normalize_arch(uts.machine)); >> + env->arch_name_normalized = true; >> + } else { >> + char *normalized = strdup(normalize_arch(env->arch)); >> + >> + free(env->arch); >> + env->arch = normalized; >> + env->arch_name_normalized = true; >> + } >> + return env->arch; >> } >> >> #if defined(HAVE_LIBTRACEEVENT) >> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h >> index 9977b85523a8..da20bc3810bd 100644 >> --- a/tools/perf/util/env.h >> +++ b/tools/perf/util/env.h >> @@ -61,6 +61,7 @@ struct perf_env { >> char *os_release; >> char *version; >> char *arch; >> + bool arch_name_normalized; >> int nr_cpus_online; >> int nr_cpus_avail; >> char *cpu_desc; >> ``` >> or just force the normalize before assignment. >> >> For reader__read_event and the perf_env these changes may be relevant: >> "perf sample: Remove arch notion of sample parsing" >> https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com >> "perf env: Remove global perf_env" >> https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com >> >> I wonder if for some reason the env is NULL in your cases, so we go >> slow path a lot. >> >> The histogram stuff, Namhyung may know more. >> >> Thanks! >> Ian >> > > Thanks Ian, > > I tested your patch, but there is no big difference: > root in 🌐 s83lp47 in ~ > ❯ ll ~/perf.data-test > -rw-------. 1 root root 152280778 Oct 30 07:58 /root/perf.data-test > > here is the output with your patch: > root in 🌐 s83lp47 in ~ > ❯ time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null > > real 46m26.350s > user 46m17.559s > sys 0m7.434s > > > Here is the output from the linux-next perf version: > > root in 🌐 s83lp47 in ~ took 46m26s > ❯ time perf report -D -i ~/perf.data-test > /dev/null > > real 49m14.364s > user 49m4.872s > sys 0m8.037s > > root in 🌐 s83lp47 in ~ took 49m14s > > When I go back history and use: > > # pwd > /root/mirror-linux-next/tools/perf > # git branch > * (HEAD detached at v6.10) > master > ]# ll ~/perf.data-test > -rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-test > # time ./perf report -D -i ~/perf.data-test > /dev/null > > real 8m49.089s > user 8m47.898s > sys 0m0.848s > # > > Thanks for the pointer. Since the issue is with s390 specific code > I will do a git bisect and we go from there. > This is the first bad commit.... (I actually tested it :-(, obviously without -D option ) commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD) Author: Ian Rogers <irogers@google.com> Date: Sun Oct 5 11:24:16 2025 -0700 perf jevents: Add legacy-hardware and legacy-cache json The legacy-hardware.json is added containing hardware events similarly to the software.json file. A difference is that for the software PMU the name is known and matches sysfs. In the legacy-hardware.json no Unit/PMU is specified for the events meaning default_core is used and the events will appear for all core PMUs. ...... What puzzles me is that the above patch does not change anything in util/s390-sample-raw.c which handles s390 specific sample interpretation. Maybe that patch affects perf_pmus__find()? Not sure how to proceed. Thanks a lot -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-30 13:25 ` Thomas Richter @ 2025-10-30 14:27 ` Thomas Richter 2025-10-30 15:57 ` Ian Rogers 0 siblings, 1 reply; 14+ messages in thread From: Thomas Richter @ 2025-10-30 14:27 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On 10/30/25 14:25, Thomas Richter wrote: > On 10/30/25 09:59, Thomas Richter wrote: >> On 10/29/25 17:47, Ian Rogers wrote: >>> On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote: >>>> >>>> On 10/29/25 09:03, Thomas Richter wrote: >>>>> Hi all, >>>>> >>>>> has something changed recently in perf report in linux-next repo? >>>>> >>>>> I run perf report -i <file> -D | grep something >>>>> on s390 with 2 different kernels: >>>>> >>>>> master 6.18.0rc3: >>>>> # uname -a >>>>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux >>>>> # ll /tmp/perf2-eSso2q-2 >>>>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2 >>>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null >>>>> >>>>> real 0m22.262s >>>>> user 0m22.068s >>>>> sys 0m0.200s >>>>> # >>>>> >>>>> This runtime is ok for the perf.data file size. >>>>> >>>>> linux-next: >>>>> # uname -a >>>>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux >>>>> #ll /tmp/perf2-eSso2q-2 >>>>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2 >>>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null >>>>> >>>>> real 2m53.601s >>>>> user 2m52.938s >>>>> sys 0m0.575s >>>>> # >>>>> >>>>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer >>>>> for the same input! >>>>> >>>>> Has anybody similar observations? >>>>> >>>>> Thanks a lot >>>>> >>>> >>>> Ian, >>>> >>>> sorry sent button pushed too early... >>>> >>>> here is a second try without option -D. >>>> >>>> This is much better: >>>> >>>> root in 🌐 s83lp47 in ~ >>>> ❯ ll perf1.data >>>> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data >>>> >>>> root in 🌐 s83lp47 in ~ >>>> ❯ time perf record -- perf report -i perf1.data > /dev/null >>>> [ perf record: Woken up 1 times to write data ] >>>> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ] >>>> >>>> real 0m1.097s >>>> user 0m0.082s >>>> sys 0m0.167s >>>> >>>> root in 🌐 s83lp47 in ~ >>>> ❯ perf report --stdio >>>> # To display the perf.data header info, please use --header/--header-only options. >>>> # >>>> # >>>> # Total Lost Samples: 0 >>>> # >>>> # Samples: 246 of event 'cpum_cf/cycles/P' >>>> # Event count (approx.): 307500000 >>>> # >>>> # Overhead Command Shared Object Symbol >>>> # ........ ....... ................... .................................................. >>>> # >>>> 9.35% perf perf [.] reader__read_event >>>> 4.47% perf perf [.] perf_env__arch >>>> 2.85% perf ld64.so.1 [.] _dl_relocate_object_no_relro >>>> 2.44% perf ld64.so.1 [.] _dl_lookup_symbol_x >>>> 2.44% perf ld64.so.1 [.] do_lookup_x >>>> 2.44% perf perf [.] down_read >>>> 2.03% perf perf [.] hists__findnew_entry >>>> 2.03% perf perf [.] perf_hpp__is_dynamic_entry >>>> 1.63% perf [kernel.kallsyms] [k] __s390_indirect_jump_r14 >>>> 1.63% perf [kernel.kallsyms] [k] format_decode >>>> 1.63% perf libc.so.6 [.] cfree@GLIBC_2.2 >>>> 1.63% perf perf [.] evsel__parse_sample >>>> 1.63% perf perf [.] map__put >>>> 1.63% perf perf [.] sort__comm_cmp >>>> 1.63% perf perf [.] thread__put >>>> 1.22% perf [kernel.kallsyms] [k] folio_add_file_rmap_ptes >>>> 1.22% perf [kernel.kallsyms] [k] get_page_from_freelist >>>> 1.22% perf ld64.so.1 [.] _dl_name_match_p >>>> 1.22% perf perf [.] dump_printf >>>> 1.22% perf perf [.] evlist__event2evsel >>>> 1.22% perf perf [.] hashmap_find >>>> 1.22% perf perf [.] perf_session__deliver_event >>> >>> Hmm.. for perf_env__arch we could probably cache better: >>> ``` >>> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c >>> index f1626d2032cd..2591f71c7c42 100644 >>> --- a/tools/perf/util/env.c >>> +++ b/tools/perf/util/env.c >>> @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch) >>> >>> const char *perf_env__arch(struct perf_env *env) >>> { >>> - char *arch_name; >>> + if (env && env->arch && env->arch_name_normalized) >>> + return env->arch; >>> >>> if (!env || !env->arch) { /* Assume local operation */ >>> static struct utsname uts = { .machine[0] = '\0', }; >>> if (uts.machine[0] == '\0' && uname(&uts) < 0) >>> return NULL; >>> - arch_name = uts.machine; >>> - } else >>> - arch_name = env->arch; >>> >>> - return normalize_arch(arch_name); >>> + if (!env) >>> + return normalize_arch(uts.machine); >>> + >>> + env->arch = strdup(normalize_arch(uts.machine)); >>> + env->arch_name_normalized = true; >>> + } else { >>> + char *normalized = strdup(normalize_arch(env->arch)); >>> + >>> + free(env->arch); >>> + env->arch = normalized; >>> + env->arch_name_normalized = true; >>> + } >>> + return env->arch; >>> } >>> >>> #if defined(HAVE_LIBTRACEEVENT) >>> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h >>> index 9977b85523a8..da20bc3810bd 100644 >>> --- a/tools/perf/util/env.h >>> +++ b/tools/perf/util/env.h >>> @@ -61,6 +61,7 @@ struct perf_env { >>> char *os_release; >>> char *version; >>> char *arch; >>> + bool arch_name_normalized; >>> int nr_cpus_online; >>> int nr_cpus_avail; >>> char *cpu_desc; >>> ``` >>> or just force the normalize before assignment. >>> >>> For reader__read_event and the perf_env these changes may be relevant: >>> "perf sample: Remove arch notion of sample parsing" >>> https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com >>> "perf env: Remove global perf_env" >>> https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com >>> >>> I wonder if for some reason the env is NULL in your cases, so we go >>> slow path a lot. >>> >>> The histogram stuff, Namhyung may know more. >>> >>> Thanks! >>> Ian >>> >> >> Thanks Ian, >> >> I tested your patch, but there is no big difference: >> root in 🌐 s83lp47 in ~ >> ❯ ll ~/perf.data-test >> -rw-------. 1 root root 152280778 Oct 30 07:58 /root/perf.data-test >> >> here is the output with your patch: >> root in 🌐 s83lp47 in ~ >> ❯ time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null >> >> real 46m26.350s >> user 46m17.559s >> sys 0m7.434s >> >> >> Here is the output from the linux-next perf version: >> >> root in 🌐 s83lp47 in ~ took 46m26s >> ❯ time perf report -D -i ~/perf.data-test > /dev/null >> >> real 49m14.364s >> user 49m4.872s >> sys 0m8.037s >> >> root in 🌐 s83lp47 in ~ took 49m14s >> >> When I go back history and use: >> >> # pwd >> /root/mirror-linux-next/tools/perf >> # git branch >> * (HEAD detached at v6.10) >> master >> ]# ll ~/perf.data-test >> -rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-test >> # time ./perf report -D -i ~/perf.data-test > /dev/null >> >> real 8m49.089s >> user 8m47.898s >> sys 0m0.848s >> # >> >> Thanks for the pointer. Since the issue is with s390 specific code >> I will do a git bisect and we go from there. >> > > This is the first bad commit.... (I actually tested it :-(, obviously without -D option ) > > commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD) > Author: Ian Rogers <irogers@google.com> > Date: Sun Oct 5 11:24:16 2025 -0700 > > perf jevents: Add legacy-hardware and legacy-cache json > > The legacy-hardware.json is added containing hardware events similarly > to the software.json file. A difference is that for the software PMU > the name is known and matches sysfs. In the legacy-hardware.json no > Unit/PMU is specified for the events meaning default_core is used and > the events will appear for all core PMUs. > ...... > > What puzzles me is that the above patch does not change anything in > util/s390-sample-raw.c which handles s390 specific sample interpretation. > > Maybe that patch affects perf_pmus__find()? > Not sure how to proceed. > > Thanks a lot Ian, the issue is within function perf_pmu__for_each_event(). Without calling this function the output is as fast as before. (Well a bit faster, because I replaced this call by strdup("hardcoded-name");) I have looked at this code, but I did not understand a thing.... Strange, perf_pmu__for_each_event() did not change in this commit, but the behavior changed a lot. What we need for s390 is some way of faster event name lookup, the PMU is known, its "cpum_cf". Is there no easy way to find the events on this PMU (given the event number). Maybe a some sort of table, because this lookup is done millions of times on a large file with samples including raw data Ideas? Thanks -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-30 14:27 ` Thomas Richter @ 2025-10-30 15:57 ` Ian Rogers 2025-10-30 17:14 ` Ian Rogers 2025-10-31 7:06 ` Thomas Richter 0 siblings, 2 replies; 14+ messages in thread From: Ian Rogers @ 2025-10-30 15:57 UTC (permalink / raw) To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote: > > On 10/30/25 14:25, Thomas Richter wrote: > > On 10/30/25 09:59, Thomas Richter wrote: > > This is the first bad commit.... (I actually tested it :-(, obviously without -D option ) > > > > commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD) > > Author: Ian Rogers <irogers@google.com> > > Date: Sun Oct 5 11:24:16 2025 -0700 > > > > perf jevents: Add legacy-hardware and legacy-cache json > > > > The legacy-hardware.json is added containing hardware events similarly > > to the software.json file. A difference is that for the software PMU > > the name is known and matches sysfs. In the legacy-hardware.json no > > Unit/PMU is specified for the events meaning default_core is used and > > the events will appear for all core PMUs. > > ...... > > > > What puzzles me is that the above patch does not change anything in > > util/s390-sample-raw.c which handles s390 specific sample interpretation. > > > > Maybe that patch affects perf_pmus__find()? > > Not sure how to proceed. > > > > Thanks a lot > > Ian, > > the issue is within function perf_pmu__for_each_event(). > Without calling this function the output is as fast as before. > (Well a bit faster, because I replaced this call by strdup("hardcoded-name");) > > I have looked at this code, but I did not understand a thing.... > Strange, perf_pmu__for_each_event() did not change in this commit, > but the behavior changed a lot. > > What we need for s390 is some way of faster event name lookup, > the PMU is known, its "cpum_cf". Is there no easy way to > find the events on this PMU (given the event number). > > Maybe a some sort of table, because this lookup is done > millions of times on a large file with samples including raw data > > Ideas? Thanks Thomas! The bisect and the behavior make sense now. Previously the legacy events weren't part of perf_pmu__for_each_event and now with the legacy-hardware.json they are. The events were parsed by matching explicitly in the parse-events.l lex parser, but for lots of reasons the json is better. Anyway, there is common functionality for reversing a perf_event_attr.config to an event name in perf_pmu__name_from_config: https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306 https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728 The normal case for a PMU is that we want to see if it has an event by name, so we have a hashmap from name to the "alias" information (I don't really like the name "alias", it is information about an event we want to encode): https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58 The use-case for perf_pmu__name_from_config is generally debug output, so there isn't a fast path config to name lookup. Is config to name lookup sufficient in s390-sample-raw.c? I notice that the code is matching the "event=" part of the encoding, which is a sub-part of the config (and may actually be a value in config2 or config3). I wonder if it would be easier to add a cache of matched names in s390-sample-raw.c? So a global/static hashmap in that C file with a key of set and nr as per get_counter_name (it seems the PMU can be implied to always only be the core PMU) and a value of a strdup-ed counter name. Let me know what you think and I can send you a patch if you like. Thanks, Ian ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-30 15:57 ` Ian Rogers @ 2025-10-30 17:14 ` Ian Rogers 2025-10-31 8:20 ` Thomas Richter 2025-10-31 7:06 ` Thomas Richter 1 sibling, 1 reply; 14+ messages in thread From: Ian Rogers @ 2025-10-30 17:14 UTC (permalink / raw) To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On Thu, Oct 30, 2025 at 8:57 AM Ian Rogers <irogers@google.com> wrote: > > On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote: > > > > On 10/30/25 14:25, Thomas Richter wrote: > > > On 10/30/25 09:59, Thomas Richter wrote: > > > This is the first bad commit.... (I actually tested it :-(, obviously without -D option ) > > > > > > commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD) > > > Author: Ian Rogers <irogers@google.com> > > > Date: Sun Oct 5 11:24:16 2025 -0700 > > > > > > perf jevents: Add legacy-hardware and legacy-cache json > > > > > > The legacy-hardware.json is added containing hardware events similarly > > > to the software.json file. A difference is that for the software PMU > > > the name is known and matches sysfs. In the legacy-hardware.json no > > > Unit/PMU is specified for the events meaning default_core is used and > > > the events will appear for all core PMUs. > > > ...... > > > > > > What puzzles me is that the above patch does not change anything in > > > util/s390-sample-raw.c which handles s390 specific sample interpretation. > > > > > > Maybe that patch affects perf_pmus__find()? > > > Not sure how to proceed. > > > > > > Thanks a lot > > > > Ian, > > > > the issue is within function perf_pmu__for_each_event(). > > Without calling this function the output is as fast as before. > > (Well a bit faster, because I replaced this call by strdup("hardcoded-name");) > > > > I have looked at this code, but I did not understand a thing.... > > Strange, perf_pmu__for_each_event() did not change in this commit, > > but the behavior changed a lot. > > > > What we need for s390 is some way of faster event name lookup, > > the PMU is known, its "cpum_cf". Is there no easy way to > > find the events on this PMU (given the event number). > > > > Maybe a some sort of table, because this lookup is done > > millions of times on a large file with samples including raw data > > > > Ideas? > > Thanks Thomas! > > The bisect and the behavior make sense now. Previously the legacy > events weren't part of perf_pmu__for_each_event and now with the > legacy-hardware.json they are. The events were parsed by matching > explicitly in the parse-events.l lex parser, but for lots of reasons > the json is better. Anyway, there is common functionality for > reversing a perf_event_attr.config to an event name in > perf_pmu__name_from_config: > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306 > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728 > > The normal case for a PMU is that we want to see if it has an event by > name, so we have a hashmap from name to the "alias" information (I > don't really like the name "alias", it is information about an event > we want to encode): > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58 > > The use-case for perf_pmu__name_from_config is generally debug output, > so there isn't a fast path config to name lookup. Is config to name > lookup sufficient in s390-sample-raw.c? I notice that the code is > matching the "event=" part of the encoding, which is a sub-part of the > config (and may actually be a value in config2 or config3). > > I wonder if it would be easier to add a cache of matched names in > s390-sample-raw.c? So a global/static hashmap in that C file with a > key of set and nr as per get_counter_name (it seems the PMU can be > implied to always only be the core PMU) and a value of a strdup-ed > counter name. > > Let me know what you think and I can send you a patch if you like. So: https://lore.kernel.org/lkml/20251030171211.1109480-1-irogers@google.com/ should cover it. Let me know what you think. Thanks, Ian ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-30 17:14 ` Ian Rogers @ 2025-10-31 8:20 ` Thomas Richter 0 siblings, 0 replies; 14+ messages in thread From: Thomas Richter @ 2025-10-31 8:20 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On 10/30/25 18:14, Ian Rogers wrote: > On Thu, Oct 30, 2025 at 8:57 AM Ian Rogers <irogers@google.com> wrote: >> >> On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote: >>> >>> On 10/30/25 14:25, Thomas Richter wrote: >>>> On 10/30/25 09:59, Thomas Richter wrote: >>>> This is the first bad commit.... (I actually tested it :-(, obviously without -D option ) >>>> >>>> commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD) >>>> Author: Ian Rogers <irogers@google.com> >>>> Date: Sun Oct 5 11:24:16 2025 -0700 >>>> >>>> perf jevents: Add legacy-hardware and legacy-cache json >>>> >>>> The legacy-hardware.json is added containing hardware events similarly >>>> to the software.json file. A difference is that for the software PMU >>>> the name is known and matches sysfs. In the legacy-hardware.json no >>>> Unit/PMU is specified for the events meaning default_core is used and >>>> the events will appear for all core PMUs. >>>> ...... >>>> >>>> What puzzles me is that the above patch does not change anything in >>>> util/s390-sample-raw.c which handles s390 specific sample interpretation. >>>> >>>> Maybe that patch affects perf_pmus__find()? >>>> Not sure how to proceed. >>>> >>>> Thanks a lot >>> >>> Ian, >>> >>> the issue is within function perf_pmu__for_each_event(). >>> Without calling this function the output is as fast as before. >>> (Well a bit faster, because I replaced this call by strdup("hardcoded-name");) >>> >>> I have looked at this code, but I did not understand a thing.... >>> Strange, perf_pmu__for_each_event() did not change in this commit, >>> but the behavior changed a lot. >>> >>> What we need for s390 is some way of faster event name lookup, >>> the PMU is known, its "cpum_cf". Is there no easy way to >>> find the events on this PMU (given the event number). >>> >>> Maybe a some sort of table, because this lookup is done >>> millions of times on a large file with samples including raw data >>> >>> Ideas? >> >> Thanks Thomas! >> >> The bisect and the behavior make sense now. Previously the legacy >> events weren't part of perf_pmu__for_each_event and now with the >> legacy-hardware.json they are. The events were parsed by matching >> explicitly in the parse-events.l lex parser, but for lots of reasons >> the json is better. Anyway, there is common functionality for >> reversing a perf_event_attr.config to an event name in >> perf_pmu__name_from_config: >> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306 >> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728 >> >> The normal case for a PMU is that we want to see if it has an event by >> name, so we have a hashmap from name to the "alias" information (I >> don't really like the name "alias", it is information about an event >> we want to encode): >> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58 >> >> The use-case for perf_pmu__name_from_config is generally debug output, >> so there isn't a fast path config to name lookup. Is config to name >> lookup sufficient in s390-sample-raw.c? I notice that the code is >> matching the "event=" part of the encoding, which is a sub-part of the >> config (and may actually be a value in config2 or config3). >> >> I wonder if it would be easier to add a cache of matched names in >> s390-sample-raw.c? So a global/static hashmap in that C file with a >> key of set and nr as per get_counter_name (it seems the PMU can be >> implied to always only be the core PMU) and a value of a strdup-ed >> counter name. >> >> Let me know what you think and I can send you a patch if you like. > > So: > https://lore.kernel.org/lkml/20251030171211.1109480-1-irogers@google.com/ > should cover it. Let me know what you think. > > Thanks, > Ian Great work, I downloaded and tested your patch # time perf report -D -i ~/perf.data-test > /dev/null real 9m16.125s user 9m14.838s sys 0m0.912s [root@b83lp65 perf]# ll ~/perf.data-test -rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-390raw # This is back to what is was before. Much better than the ~50 minutes. If there is need for further improvement, we can go for an array solution. Please submit this patch to the mailing list. Thank you very much for your help. Tested-by: Thomas Richter <tmricht@linux.ibm.com> -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [linux-next] perf report runtime extremely long 2025-10-30 15:57 ` Ian Rogers 2025-10-30 17:14 ` Ian Rogers @ 2025-10-31 7:06 ` Thomas Richter 1 sibling, 0 replies; 14+ messages in thread From: Thomas Richter @ 2025-10-31 7:06 UTC (permalink / raw) To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev On 10/30/25 16:57, Ian Rogers wrote: > On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote: >> >> On 10/30/25 14:25, Thomas Richter wrote: >>> On 10/30/25 09:59, Thomas Richter wrote: >>> This is the first bad commit.... (I actually tested it :-(, obviously without -D option ) >>> >>> commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD) >>> Author: Ian Rogers <irogers@google.com> >>> Date: Sun Oct 5 11:24:16 2025 -0700 >>> >>> perf jevents: Add legacy-hardware and legacy-cache json >>> >>> The legacy-hardware.json is added containing hardware events similarly >>> to the software.json file. A difference is that for the software PMU >>> the name is known and matches sysfs. In the legacy-hardware.json no >>> Unit/PMU is specified for the events meaning default_core is used and >>> the events will appear for all core PMUs. >>> ...... >>> >>> What puzzles me is that the above patch does not change anything in >>> util/s390-sample-raw.c which handles s390 specific sample interpretation. >>> >>> Maybe that patch affects perf_pmus__find()? >>> Not sure how to proceed. >>> >>> Thanks a lot >> >> Ian, >> >> the issue is within function perf_pmu__for_each_event(). >> Without calling this function the output is as fast as before. >> (Well a bit faster, because I replaced this call by strdup("hardcoded-name");) >> >> I have looked at this code, but I did not understand a thing.... >> Strange, perf_pmu__for_each_event() did not change in this commit, >> but the behavior changed a lot. >> >> What we need for s390 is some way of faster event name lookup, >> the PMU is known, its "cpum_cf". Is there no easy way to >> find the events on this PMU (given the event number). >> >> Maybe a some sort of table, because this lookup is done >> millions of times on a large file with samples including raw data >> >> Ideas? > > Thanks Thomas! > > The bisect and the behavior make sense now. Previously the legacy > events weren't part of perf_pmu__for_each_event and now with the > legacy-hardware.json they are. The events were parsed by matching > explicitly in the parse-events.l lex parser, but for lots of reasons > the json is better. Anyway, there is common functionality for > reversing a perf_event_attr.config to an event name in > perf_pmu__name_from_config: > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306 > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728 > > The normal case for a PMU is that we want to see if it has an event by > name, so we have a hashmap from name to the "alias" information (I > don't really like the name "alias", it is information about an event > we want to encode): > https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58 > > The use-case for perf_pmu__name_from_config is generally debug output, > so there isn't a fast path config to name lookup. Is config to name > lookup sufficient in s390-sample-raw.c? I notice that the code is > matching the "event=" part of the encoding, which is a sub-part of the > config (and may actually be a value in config2 or config3). > > I wonder if it would be easier to add a cache of matched names in > s390-sample-raw.c? So a global/static hashmap in that C file with a > key of set and nr as per get_counter_name (it seems the PMU can be > implied to always only be the core PMU) and a value of a strdup-ed > counter name. I do confirm your assessment, here is the proof. A very poor man's caching in util/s390-sample-raw.c: # git diff diff --git a/tools/perf/util/s390-sample-raw.c b/tools/perf/util/s390-sample-raw.c index 335217bb532b..b04d3b1a8750 100644 --- a/tools/perf/util/s390-sample-raw.c +++ b/tools/perf/util/s390-sample-raw.c @@ -162,6 +162,7 @@ static int get_counter_name_callback(void *vdata, struct pmu_event_info *info) * number and use this as key. If they match return the name of this counter. * If no match is found a NULL pointer is returned. */ +static char *ctrname[500]; static char *get_counter_name(int set, int nr, struct perf_pmu *pmu) { struct get_counter_name_data data = { @@ -171,9 +172,14 @@ static char *get_counter_name(int set, int nr, struct perf_pmu *pmu) if (!pmu) return NULL; +if (ctrname[data.wanted]) return strdup(ctrname[data.wanted]); perf_pmu__for_each_event(pmu, /*skip_duplicate_pmus=*/ true, &data, get_counter_name_callback); +if(data.result) + ctrname[data.wanted] = strdup(data.result); +else + ctrname[data.wanted] = strdup("unknown"); return data.result; } # cd # time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null real 0m25.908s user 0m25.548s sys 0m0.182s # > > Let me know what you think and I can send you a patch if you like. > > Thanks, > Ian Of course!! A hash list is more flexible, but an array is fine too. There are very few holes in the counter set numbers, They are listed as "unknown". That does not mean these counter do not exist! Most of them have values, but their name was not made public in the documentation SA23-2261-09. And there are more counters from the PAI crypto (172) and PAI NNPA PMUs (36), not much but in a different range. So I guess this qualifies for the hash solution. Thanks a lot -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply related [flat|nested] 14+ messages in thread
end of thread, other threads:[~2025-10-31 8:20 UTC | newest] Thread overview: 14+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-29 8:03 [linux-next] perf report runtime extremely long Thomas Richter 2025-10-29 15:13 ` Ian Rogers 2025-10-29 15:52 ` Thomas Richter 2025-10-29 16:25 ` Ian Rogers 2025-10-29 15:56 ` Thomas Richter 2025-10-29 16:47 ` Ian Rogers 2025-10-30 6:04 ` Namhyung Kim 2025-10-30 8:59 ` Thomas Richter 2025-10-30 13:25 ` Thomas Richter 2025-10-30 14:27 ` Thomas Richter 2025-10-30 15:57 ` Ian Rogers 2025-10-30 17:14 ` Ian Rogers 2025-10-31 8:20 ` Thomas Richter 2025-10-31 7:06 ` Thomas Richter
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).