[linux-next] perf report runtime extremely long

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [linux-next] perf report runtime extremely long
@ 2025-10-29  8:03 Thomas Richter
  2025-10-29 15:13 ` Ian Rogers
  2025-10-29 15:56 ` Thomas Richter
  0 siblings, 2 replies; 14+ messages in thread
From: Thomas Richter @ 2025-10-29  8:03 UTC (permalink / raw)
  To: linux-perf-use.; +Cc: Jan Polensky, Alexander Gordeev

Hi all,

has something changed recently in perf report in linux-next repo?

I run perf report -i <file> -D | grep something
on s390 with 2 different kernels:

master 6.18.0rc3:
# uname -a
Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
# ll /tmp/perf2-eSso2q-2
-rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
# time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null

real	0m22.262s
user	0m22.068s
sys	0m0.200s
# 

This runtime is ok for the perf.data file size.

linux-next:
# uname -a
Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
#ll /tmp/perf2-eSso2q-2
-rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
# time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null

real	2m53.601s
user	2m52.938s
sys	0m0.575s
#

Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
for the same input!

Has anybody similar observations?

Thanks a lot

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-29  8:03 [linux-next] perf report runtime extremely long Thomas Richter
@ 2025-10-29 15:13 ` Ian Rogers
  2025-10-29 15:52   ` Thomas Richter
  2025-10-29 15:56 ` Thomas Richter
  1 sibling, 1 reply; 14+ messages in thread
From: Ian Rogers @ 2025-10-29 15:13 UTC (permalink / raw)
  To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On Wed, Oct 29, 2025 at 1:04 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>
> Hi all,
>
> has something changed recently in perf report in linux-next repo?
>
> I run perf report -i <file> -D | grep something
> on s390 with 2 different kernels:
>
> master 6.18.0rc3:
> # uname -a
> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
> # ll /tmp/perf2-eSso2q-2
> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>
> real    0m22.262s
> user    0m22.068s
> sys     0m0.200s
> #
>
> This runtime is ok for the perf.data file size.
>
> linux-next:
> # uname -a
> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
> #ll /tmp/perf2-eSso2q-2
> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>
> real    2m53.601s
> user    2m52.938s
> sys     0m0.575s
> #
>
> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
> for the same input!
>
> Has anybody similar observations?

So there were changes relating to addr2line that could impact this:
https://lore.kernel.org/lkml/20251005212212.2892175-1-irogers@google.com/

There was also the switch to always using build IDs:
https://lore.kernel.org/lkml/20250724163302.596743-1-irogers@google.com/

Any chance you could do a `perf record`/`perf report` to see where the
time is going?

Thanks!
Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-29 15:13 ` Ian Rogers
@ 2025-10-29 15:52   ` Thomas Richter
  2025-10-29 16:25     ` Ian Rogers
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Richter @ 2025-10-29 15:52 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On 10/29/25 16:13, Ian Rogers wrote:
> On Wed, Oct 29, 2025 at 1:04 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>
>> Hi all,
>>
>> has something changed recently in perf report in linux-next repo?
>>
>> I run perf report -i <file> -D | grep something
>> on s390 with 2 different kernels:
>>
>> master 6.18.0rc3:
>> # uname -a
>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
>> # ll /tmp/perf2-eSso2q-2
>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>
>> real    0m22.262s
>> user    0m22.068s
>> sys     0m0.200s
>> #
>>
>> This runtime is ok for the perf.data file size.
>>
>> linux-next:
>> # uname -a
>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
>> #ll /tmp/perf2-eSso2q-2
>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>
>> real    2m53.601s
>> user    2m52.938s
>> sys     0m0.575s
>> #
>>
>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
>> for the same input!
>>
>> Has anybody similar observations?
> 
> So there were changes relating to addr2line that could impact this:
> https://lore.kernel.org/lkml/20251005212212.2892175-1-irogers@google.com/
> 
> There was also the switch to always using build IDs:
> https://lore.kernel.org/lkml/20250724163302.596743-1-irogers@google.com/
> 
> Any chance you could do a `perf record`/`perf report` to see where the
> time is going?
> 
> Thanks!
> Ian

Ok here is a first try, can do more tests tomorrow morning.
The file perf1.data has this size:

root in 🌐 s83lp47 in ~ 
❯ ll perf1.data 
-rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data

root in 🌐 s83lp47 in ~ 
❯ 

time perf record -- perf report -i perf1.data -D > /dev/null
^C[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 2.742 MB perf.data (70481 samples) ]
Terminated

real	13m9.894s
user	13m6.583s
sys	0m2.449s

root in 🌐 s83lp47 in ~ took 13m9s 
❯ perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 70K of event 'cpum_cf/cycles/P'
# Event count (approx.): 88101250000
#
# Overhead  Command  Shared Object      Symbol                                                                            
# ........  .......  .................  ..................................................................................
#
    26.67%  perf     libc.so.6          [.] __printf_buffer
    12.43%  perf     libc.so.6          [.] __GI___strnlen
    12.30%  perf     libc.so.6          [.] __strstr_vx
    11.92%  perf     perf               [.] perf_pmu__for_each_event
     7.46%  perf     libc.so.6          [.] __printf_buffer_write
     5.48%  perf     libc.so.6          [.] memcpy
     5.13%  perf     libc.so.6          [.] __GI___strchrnul
     3.25%  perf     libc.so.6          [.] __vfscanf_internal
     2.61%  perf     libc.so.6          [.] strchr
     2.39%  perf     libc.so.6          [.] __vsnprintf_internal
     2.19%  perf     libc.so.6          [.] __printf_buffer_done
     2.03%  perf     perf               [.] perf_pmu__is_tool
     1.14%  perf     libc.so.6          [.] __snprintf_chk@@GLIBC_2.4
     1.05%  perf     perf               [.] get_counter_name_callback
     0.73%  perf     libc.so.6          [.] __GI_____strtoull_l_internal
     0.60%  perf     libc.so.6          [.] memcmp
     0.28%  perf     libc.so.6          [.] _IO_str_init_static_internal
     0.19%  perf     libc.so.6          [.] strlen

Hope this helps

Kind regards
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-29  8:03 [linux-next] perf report runtime extremely long Thomas Richter
  2025-10-29 15:13 ` Ian Rogers
@ 2025-10-29 15:56 ` Thomas Richter
  2025-10-29 16:47   ` Ian Rogers
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Richter @ 2025-10-29 15:56 UTC (permalink / raw)
  To: linux-perf-use.; +Cc: Jan Polensky, Alexander Gordeev

On 10/29/25 09:03, Thomas Richter wrote:
> Hi all,
> 
> has something changed recently in perf report in linux-next repo?
> 
> I run perf report -i <file> -D | grep something
> on s390 with 2 different kernels:
> 
> master 6.18.0rc3:
> # uname -a
> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
> # ll /tmp/perf2-eSso2q-2
> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> 
> real	0m22.262s
> user	0m22.068s
> sys	0m0.200s
> # 
> 
> This runtime is ok for the perf.data file size.
> 
> linux-next:
> # uname -a
> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
> #ll /tmp/perf2-eSso2q-2
> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> 
> real	2m53.601s
> user	2m52.938s
> sys	0m0.575s
> #
> 
> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
> for the same input!
> 
> Has anybody similar observations?
> 
> Thanks a lot
> 

Ian,

sorry sent button pushed too early...

here is a second try without option -D.

This is much better:

root in 🌐 s83lp47 in ~ 
❯ ll perf1.data 
-rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data

root in 🌐 s83lp47 in ~ 
❯ time perf record -- perf report -i perf1.data  > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]

real	0m1.097s
user	0m0.082s
sys	0m0.167s

root in 🌐 s83lp47 in ~ 
❯ perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 246  of event 'cpum_cf/cycles/P'
# Event count (approx.): 307500000
#
# Overhead  Command  Shared Object        Symbol                                            
# ........  .......  ...................  ..................................................
#
     9.35%  perf     perf                 [.] reader__read_event
     4.47%  perf     perf                 [.] perf_env__arch
     2.85%  perf     ld64.so.1            [.] _dl_relocate_object_no_relro
     2.44%  perf     ld64.so.1            [.] _dl_lookup_symbol_x
     2.44%  perf     ld64.so.1            [.] do_lookup_x
     2.44%  perf     perf                 [.] down_read
     2.03%  perf     perf                 [.] hists__findnew_entry
     2.03%  perf     perf                 [.] perf_hpp__is_dynamic_entry
     1.63%  perf     [kernel.kallsyms]    [k] __s390_indirect_jump_r14
     1.63%  perf     [kernel.kallsyms]    [k] format_decode
     1.63%  perf     libc.so.6            [.] cfree@GLIBC_2.2
     1.63%  perf     perf                 [.] evsel__parse_sample
     1.63%  perf     perf                 [.] map__put
     1.63%  perf     perf                 [.] sort__comm_cmp
     1.63%  perf     perf                 [.] thread__put
     1.22%  perf     [kernel.kallsyms]    [k] folio_add_file_rmap_ptes
     1.22%  perf     [kernel.kallsyms]    [k] get_page_from_freelist
     1.22%  perf     ld64.so.1            [.] _dl_name_match_p
     1.22%  perf     perf                 [.] dump_printf
     1.22%  perf     perf                 [.] evlist__event2evsel
     1.22%  perf     perf                 [.] hashmap_find
     1.22%  perf     perf                 [.] perf_session__deliver_event

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-29 15:52   ` Thomas Richter
@ 2025-10-29 16:25     ` Ian Rogers
  0 siblings, 0 replies; 14+ messages in thread
From: Ian Rogers @ 2025-10-29 16:25 UTC (permalink / raw)
  To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On Wed, Oct 29, 2025 at 8:52 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>
> On 10/29/25 16:13, Ian Rogers wrote:
> > On Wed, Oct 29, 2025 at 1:04 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
> >>
> >> Hi all,
> >>
> >> has something changed recently in perf report in linux-next repo?
> >>
> >> I run perf report -i <file> -D | grep something
> >> on s390 with 2 different kernels:
> >>
> >> master 6.18.0rc3:
> >> # uname -a
> >> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
> >> # ll /tmp/perf2-eSso2q-2
> >> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
> >> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> >>
> >> real    0m22.262s
> >> user    0m22.068s
> >> sys     0m0.200s
> >> #
> >>
> >> This runtime is ok for the perf.data file size.
> >>
> >> linux-next:
> >> # uname -a
> >> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
> >> #ll /tmp/perf2-eSso2q-2
> >> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
> >> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> >>
> >> real    2m53.601s
> >> user    2m52.938s
> >> sys     0m0.575s
> >> #
> >>
> >> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
> >> for the same input!
> >>
> >> Has anybody similar observations?
> >
> > So there were changes relating to addr2line that could impact this:
> > https://lore.kernel.org/lkml/20251005212212.2892175-1-irogers@google.com/
> >
> > There was also the switch to always using build IDs:
> > https://lore.kernel.org/lkml/20250724163302.596743-1-irogers@google.com/
> >
> > Any chance you could do a `perf record`/`perf report` to see where the
> > time is going?
> >
> > Thanks!
> > Ian
>
> Ok here is a first try, can do more tests tomorrow morning.
> The file perf1.data has this size:
>
> root in 🌐 s83lp47 in ~
> ❯ ll perf1.data
> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
>
> root in 🌐 s83lp47 in ~
> ❯
>
> time perf record -- perf report -i perf1.data -D > /dev/null
> ^C[ perf record: Woken up 6 times to write data ]
> [ perf record: Captured and wrote 2.742 MB perf.data (70481 samples) ]
> Terminated
>
> real    13m9.894s
> user    13m6.583s
> sys     0m2.449s
>
> root in 🌐 s83lp47 in ~ took 13m9s
> ❯ perf report --stdio
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 70K of event 'cpum_cf/cycles/P'
> # Event count (approx.): 88101250000
> #
> # Overhead  Command  Shared Object      Symbol
> # ........  .......  .................  ..................................................................................
> #
>     26.67%  perf     libc.so.6          [.] __printf_buffer
>     12.43%  perf     libc.so.6          [.] __GI___strnlen
>     12.30%  perf     libc.so.6          [.] __strstr_vx
>     11.92%  perf     perf               [.] perf_pmu__for_each_event
>      7.46%  perf     libc.so.6          [.] __printf_buffer_write
>      5.48%  perf     libc.so.6          [.] memcpy
>      5.13%  perf     libc.so.6          [.] __GI___strchrnul
>      3.25%  perf     libc.so.6          [.] __vfscanf_internal
>      2.61%  perf     libc.so.6          [.] strchr
>      2.39%  perf     libc.so.6          [.] __vsnprintf_internal
>      2.19%  perf     libc.so.6          [.] __printf_buffer_done
>      2.03%  perf     perf               [.] perf_pmu__is_tool
>      1.14%  perf     libc.so.6          [.] __snprintf_chk@@GLIBC_2.4
>      1.05%  perf     perf               [.] get_counter_name_callback
>      0.73%  perf     libc.so.6          [.] __GI_____strtoull_l_internal
>      0.60%  perf     libc.so.6          [.] memcmp
>      0.28%  perf     libc.so.6          [.] _IO_str_init_static_internal
>      0.19%  perf     libc.so.6          [.] strlen


Looks like the time is in s390-sample-raw.c but I don't see recent
changes there:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/log/tools/perf/util/s390-sample-raw.c?h=perf-tools-next

I wonder for the scanf if the following would help:
```
diff --git a/tools/perf/util/s390-sample-raw.c
b/tools/perf/util/s390-sample-raw.c
index a2d6a1f93cce..21edd359a43a 100644
--- a/tools/perf/util/s390-sample-raw.c
+++ b/tools/perf/util/s390-sample-raw.c
@@ -140,8 +140,9 @@ struct get_counter_name_data {
static int get_counter_name_callback(void *vdata, struct pmu_event_info *info)
{
       struct get_counter_name_data *data = vdata;
-       int rc, event_nr;
+       int event_nr;
       const char *event_str;
+       char *end_ptr;

       if (info->str == NULL)
               return 0;
@@ -150,8 +151,8 @@ static int get_counter_name_callback(void *vdata,
struct pmu_event_info *info)
       if (!event_str)
               return 0;

-       rc = sscanf(event_str, "event=%x", &event_nr);
-       if (rc == 1 && event_nr == data->wanted) {
+       event_nr = strtol(event_str + 6, &end_ptr, /*base=*/16);
+       if ((end_ptr != event_str + 6) && event_nr == data->wanted) {
               data->result = strdup(info->name);
               return 1; /* Terminate the search. */
       }
```

ah, new email, let me follow up further there.

> Hope this helps
>
> Kind regards
> --
> Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
> --
> IBM Deutschland Research & Development GmbH
>
> Vorsitzender des Aufsichtsrats: Wolfgang Wendt
>
> Geschäftsführung: David Faller
>
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-29 15:56 ` Thomas Richter
@ 2025-10-29 16:47   ` Ian Rogers
  2025-10-30  6:04     ` Namhyung Kim
  2025-10-30  8:59     ` Thomas Richter
  0 siblings, 2 replies; 14+ messages in thread
From: Ian Rogers @ 2025-10-29 16:47 UTC (permalink / raw)
  To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>
> On 10/29/25 09:03, Thomas Richter wrote:
> > Hi all,
> >
> > has something changed recently in perf report in linux-next repo?
> >
> > I run perf report -i <file> -D | grep something
> > on s390 with 2 different kernels:
> >
> > master 6.18.0rc3:
> > # uname -a
> > Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
> > # ll /tmp/perf2-eSso2q-2
> > -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
> > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> >
> > real  0m22.262s
> > user  0m22.068s
> > sys   0m0.200s
> > #
> >
> > This runtime is ok for the perf.data file size.
> >
> > linux-next:
> > # uname -a
> > Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
> > #ll /tmp/perf2-eSso2q-2
> > -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
> > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> >
> > real  2m53.601s
> > user  2m52.938s
> > sys   0m0.575s
> > #
> >
> > Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
> > for the same input!
> >
> > Has anybody similar observations?
> >
> > Thanks a lot
> >
>
> Ian,
>
> sorry sent button pushed too early...
>
> here is a second try without option -D.
>
> This is much better:
>
> root in 🌐 s83lp47 in ~
> ❯ ll perf1.data
> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
>
> root in 🌐 s83lp47 in ~
> ❯ time perf record -- perf report -i perf1.data  > /dev/null
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
>
> real    0m1.097s
> user    0m0.082s
> sys     0m0.167s
>
> root in 🌐 s83lp47 in ~
> ❯ perf report --stdio
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 246  of event 'cpum_cf/cycles/P'
> # Event count (approx.): 307500000
> #
> # Overhead  Command  Shared Object        Symbol
> # ........  .......  ...................  ..................................................
> #
>      9.35%  perf     perf                 [.] reader__read_event
>      4.47%  perf     perf                 [.] perf_env__arch
>      2.85%  perf     ld64.so.1            [.] _dl_relocate_object_no_relro
>      2.44%  perf     ld64.so.1            [.] _dl_lookup_symbol_x
>      2.44%  perf     ld64.so.1            [.] do_lookup_x
>      2.44%  perf     perf                 [.] down_read
>      2.03%  perf     perf                 [.] hists__findnew_entry
>      2.03%  perf     perf                 [.] perf_hpp__is_dynamic_entry
>      1.63%  perf     [kernel.kallsyms]    [k] __s390_indirect_jump_r14
>      1.63%  perf     [kernel.kallsyms]    [k] format_decode
>      1.63%  perf     libc.so.6            [.] cfree@GLIBC_2.2
>      1.63%  perf     perf                 [.] evsel__parse_sample
>      1.63%  perf     perf                 [.] map__put
>      1.63%  perf     perf                 [.] sort__comm_cmp
>      1.63%  perf     perf                 [.] thread__put
>      1.22%  perf     [kernel.kallsyms]    [k] folio_add_file_rmap_ptes
>      1.22%  perf     [kernel.kallsyms]    [k] get_page_from_freelist
>      1.22%  perf     ld64.so.1            [.] _dl_name_match_p
>      1.22%  perf     perf                 [.] dump_printf
>      1.22%  perf     perf                 [.] evlist__event2evsel
>      1.22%  perf     perf                 [.] hashmap_find
>      1.22%  perf     perf                 [.] perf_session__deliver_event

Hmm.. for perf_env__arch we could probably cache better:
```
diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
index f1626d2032cd..2591f71c7c42 100644
--- a/tools/perf/util/env.c
+++ b/tools/perf/util/env.c
@@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch)

const char *perf_env__arch(struct perf_env *env)
{
-       char *arch_name;
+       if (env && env->arch && env->arch_name_normalized)
+               return env->arch;

       if (!env || !env->arch) { /* Assume local operation */
               static struct utsname uts = { .machine[0] = '\0', };
               if (uts.machine[0] == '\0' && uname(&uts) < 0)
                       return NULL;
-               arch_name = uts.machine;
-       } else
-               arch_name = env->arch;

-       return normalize_arch(arch_name);
+               if (!env)
+                       return normalize_arch(uts.machine);
+
+               env->arch = strdup(normalize_arch(uts.machine));
+               env->arch_name_normalized = true;
+       } else {
+               char *normalized = strdup(normalize_arch(env->arch));
+
+               free(env->arch);
+               env->arch = normalized;
+               env->arch_name_normalized = true;
+       }
+       return env->arch;
}

#if defined(HAVE_LIBTRACEEVENT)
diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
index 9977b85523a8..da20bc3810bd 100644
--- a/tools/perf/util/env.h
+++ b/tools/perf/util/env.h
@@ -61,6 +61,7 @@ struct perf_env {
       char                    *os_release;
       char                    *version;
       char                    *arch;
+       bool                    arch_name_normalized;
       int                     nr_cpus_online;
       int                     nr_cpus_avail;
       char                    *cpu_desc;
```
or just force the normalize before assignment.

For reader__read_event and the perf_env these changes may be relevant:
"perf sample: Remove arch notion of sample parsing"
https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
"perf env: Remove global perf_env"
https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com

I wonder if for some reason the env is NULL in your cases, so we go
slow path a lot.

The histogram stuff, Namhyung may know more.

Thanks!
Ian

> --
> Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
> --
> IBM Deutschland Research & Development GmbH
>
> Vorsitzender des Aufsichtsrats: Wolfgang Wendt
>
> Geschäftsführung: David Faller
>
> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
>

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-29 16:47   ` Ian Rogers
@ 2025-10-30  6:04     ` Namhyung Kim
  2025-10-30  8:59     ` Thomas Richter
  1 sibling, 0 replies; 14+ messages in thread
From: Namhyung Kim @ 2025-10-30  6:04 UTC (permalink / raw)
  To: Ian Rogers
  Cc: Thomas Richter, linux-perf-use., Jan Polensky, Alexander Gordeev

Hello,

On Wed, Oct 29, 2025 at 09:47:00AM -0700, Ian Rogers wrote:
> On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
> >
> > On 10/29/25 09:03, Thomas Richter wrote:
> > > Hi all,
> > >
> > > has something changed recently in perf report in linux-next repo?
> > >
> > > I run perf report -i <file> -D | grep something
> > > on s390 with 2 different kernels:
> > >
> > > master 6.18.0rc3:
> > > # uname -a
> > > Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
> > > # ll /tmp/perf2-eSso2q-2
> > > -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
> > > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> > >
> > > real  0m22.262s
> > > user  0m22.068s
> > > sys   0m0.200s
> > > #
> > >
> > > This runtime is ok for the perf.data file size.
> > >
> > > linux-next:
> > > # uname -a
> > > Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
> > > #ll /tmp/perf2-eSso2q-2
> > > -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
> > > # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
> > >
> > > real  2m53.601s
> > > user  2m52.938s
> > > sys   0m0.575s
> > > #
> > >
> > > Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
> > > for the same input!
> > >
> > > Has anybody similar observations?
> > >
> > > Thanks a lot
> > >
> >
> > Ian,
> >
> > sorry sent button pushed too early...
> >
> > here is a second try without option -D.
> >
> > This is much better:
> >
> > root in 🌐 s83lp47 in ~
> > ❯ ll perf1.data
> > -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
> >
> > root in 🌐 s83lp47 in ~
> > ❯ time perf record -- perf report -i perf1.data  > /dev/null
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
> >
> > real    0m1.097s
> > user    0m0.082s
> > sys     0m0.167s
> >
> > root in 🌐 s83lp47 in ~
> > ❯ perf report --stdio
> > # To display the perf.data header info, please use --header/--header-only options.
> > #
> > #
> > # Total Lost Samples: 0
> > #
> > # Samples: 246  of event 'cpum_cf/cycles/P'
> > # Event count (approx.): 307500000
> > #
> > # Overhead  Command  Shared Object        Symbol
> > # ........  .......  ...................  ..................................................
> > #
> >      9.35%  perf     perf                 [.] reader__read_event
> >      4.47%  perf     perf                 [.] perf_env__arch
> >      2.85%  perf     ld64.so.1            [.] _dl_relocate_object_no_relro
> >      2.44%  perf     ld64.so.1            [.] _dl_lookup_symbol_x
> >      2.44%  perf     ld64.so.1            [.] do_lookup_x
> >      2.44%  perf     perf                 [.] down_read
> >      2.03%  perf     perf                 [.] hists__findnew_entry
> >      2.03%  perf     perf                 [.] perf_hpp__is_dynamic_entry
> >      1.63%  perf     [kernel.kallsyms]    [k] __s390_indirect_jump_r14
> >      1.63%  perf     [kernel.kallsyms]    [k] format_decode
> >      1.63%  perf     libc.so.6            [.] cfree@GLIBC_2.2
> >      1.63%  perf     perf                 [.] evsel__parse_sample
> >      1.63%  perf     perf                 [.] map__put
> >      1.63%  perf     perf                 [.] sort__comm_cmp
> >      1.63%  perf     perf                 [.] thread__put
> >      1.22%  perf     [kernel.kallsyms]    [k] folio_add_file_rmap_ptes
> >      1.22%  perf     [kernel.kallsyms]    [k] get_page_from_freelist
> >      1.22%  perf     ld64.so.1            [.] _dl_name_match_p
> >      1.22%  perf     perf                 [.] dump_printf
> >      1.22%  perf     perf                 [.] evlist__event2evsel
> >      1.22%  perf     perf                 [.] hashmap_find
> >      1.22%  perf     perf                 [.] perf_session__deliver_event
> 
> Hmm.. for perf_env__arch we could probably cache better:
> ```
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index f1626d2032cd..2591f71c7c42 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch)
> 
> const char *perf_env__arch(struct perf_env *env)
> {
> -       char *arch_name;
> +       if (env && env->arch && env->arch_name_normalized)
> +               return env->arch;
> 
>        if (!env || !env->arch) { /* Assume local operation */
>                static struct utsname uts = { .machine[0] = '\0', };
>                if (uts.machine[0] == '\0' && uname(&uts) < 0)
>                        return NULL;
> -               arch_name = uts.machine;
> -       } else
> -               arch_name = env->arch;
> 
> -       return normalize_arch(arch_name);
> +               if (!env)
> +                       return normalize_arch(uts.machine);
> +
> +               env->arch = strdup(normalize_arch(uts.machine));
> +               env->arch_name_normalized = true;
> +       } else {
> +               char *normalized = strdup(normalize_arch(env->arch));
> +
> +               free(env->arch);
> +               env->arch = normalized;
> +               env->arch_name_normalized = true;
> +       }
> +       return env->arch;
> }
> 
> #if defined(HAVE_LIBTRACEEVENT)
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index 9977b85523a8..da20bc3810bd 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -61,6 +61,7 @@ struct perf_env {
>        char                    *os_release;
>        char                    *version;
>        char                    *arch;
> +       bool                    arch_name_normalized;
>        int                     nr_cpus_online;
>        int                     nr_cpus_avail;
>        char                    *cpu_desc;
> ```
> or just force the normalize before assignment.
> 
> For reader__read_event and the perf_env these changes may be relevant:
> "perf sample: Remove arch notion of sample parsing"
> https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
> "perf env: Remove global perf_env"
> https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
> 
> I wonder if for some reason the env is NULL in your cases, so we go
> slow path a lot.
> 
> The histogram stuff, Namhyung may know more.

There only 2 entries for the hist code.  The hists__findnew_entry() is
expected as it's the main routine to aggregate the samples.  But the
perf_hpp__is_dynamic_entry() is strange.  It should be very small and
fast.

But I guess the actual problem is when you use -D in perf report.  And
it doesn't show hist related functions IIUC.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-29 16:47   ` Ian Rogers
  2025-10-30  6:04     ` Namhyung Kim
@ 2025-10-30  8:59     ` Thomas Richter
  2025-10-30 13:25       ` Thomas Richter
  1 sibling, 1 reply; 14+ messages in thread
From: Thomas Richter @ 2025-10-30  8:59 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On 10/29/25 17:47, Ian Rogers wrote:
> On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>
>> On 10/29/25 09:03, Thomas Richter wrote:
>>> Hi all,
>>>
>>> has something changed recently in perf report in linux-next repo?
>>>
>>> I run perf report -i <file> -D | grep something
>>> on s390 with 2 different kernels:
>>>
>>> master 6.18.0rc3:
>>> # uname -a
>>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
>>> # ll /tmp/perf2-eSso2q-2
>>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>
>>> real  0m22.262s
>>> user  0m22.068s
>>> sys   0m0.200s
>>> #
>>>
>>> This runtime is ok for the perf.data file size.
>>>
>>> linux-next:
>>> # uname -a
>>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
>>> #ll /tmp/perf2-eSso2q-2
>>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>
>>> real  2m53.601s
>>> user  2m52.938s
>>> sys   0m0.575s
>>> #
>>>
>>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
>>> for the same input!
>>>
>>> Has anybody similar observations?
>>>
>>> Thanks a lot
>>>
>>
>> Ian,
>>
>> sorry sent button pushed too early...
>>
>> here is a second try without option -D.
>>
>> This is much better:
>>
>> root in 🌐 s83lp47 in ~
>> ❯ ll perf1.data
>> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
>>
>> root in 🌐 s83lp47 in ~
>> ❯ time perf record -- perf report -i perf1.data  > /dev/null
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
>>
>> real    0m1.097s
>> user    0m0.082s
>> sys     0m0.167s
>>
>> root in 🌐 s83lp47 in ~
>> ❯ perf report --stdio
>> # To display the perf.data header info, please use --header/--header-only options.
>> #
>> #
>> # Total Lost Samples: 0
>> #
>> # Samples: 246  of event 'cpum_cf/cycles/P'
>> # Event count (approx.): 307500000
>> #
>> # Overhead  Command  Shared Object        Symbol
>> # ........  .......  ...................  ..................................................
>> #
>>      9.35%  perf     perf                 [.] reader__read_event
>>      4.47%  perf     perf                 [.] perf_env__arch
>>      2.85%  perf     ld64.so.1            [.] _dl_relocate_object_no_relro
>>      2.44%  perf     ld64.so.1            [.] _dl_lookup_symbol_x
>>      2.44%  perf     ld64.so.1            [.] do_lookup_x
>>      2.44%  perf     perf                 [.] down_read
>>      2.03%  perf     perf                 [.] hists__findnew_entry
>>      2.03%  perf     perf                 [.] perf_hpp__is_dynamic_entry
>>      1.63%  perf     [kernel.kallsyms]    [k] __s390_indirect_jump_r14
>>      1.63%  perf     [kernel.kallsyms]    [k] format_decode
>>      1.63%  perf     libc.so.6            [.] cfree@GLIBC_2.2
>>      1.63%  perf     perf                 [.] evsel__parse_sample
>>      1.63%  perf     perf                 [.] map__put
>>      1.63%  perf     perf                 [.] sort__comm_cmp
>>      1.63%  perf     perf                 [.] thread__put
>>      1.22%  perf     [kernel.kallsyms]    [k] folio_add_file_rmap_ptes
>>      1.22%  perf     [kernel.kallsyms]    [k] get_page_from_freelist
>>      1.22%  perf     ld64.so.1            [.] _dl_name_match_p
>>      1.22%  perf     perf                 [.] dump_printf
>>      1.22%  perf     perf                 [.] evlist__event2evsel
>>      1.22%  perf     perf                 [.] hashmap_find
>>      1.22%  perf     perf                 [.] perf_session__deliver_event
> 
> Hmm.. for perf_env__arch we could probably cache better:
> ```
> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
> index f1626d2032cd..2591f71c7c42 100644
> --- a/tools/perf/util/env.c
> +++ b/tools/perf/util/env.c
> @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch)
> 
> const char *perf_env__arch(struct perf_env *env)
> {
> -       char *arch_name;
> +       if (env && env->arch && env->arch_name_normalized)
> +               return env->arch;
> 
>        if (!env || !env->arch) { /* Assume local operation */
>                static struct utsname uts = { .machine[0] = '\0', };
>                if (uts.machine[0] == '\0' && uname(&uts) < 0)
>                        return NULL;
> -               arch_name = uts.machine;
> -       } else
> -               arch_name = env->arch;
> 
> -       return normalize_arch(arch_name);
> +               if (!env)
> +                       return normalize_arch(uts.machine);
> +
> +               env->arch = strdup(normalize_arch(uts.machine));
> +               env->arch_name_normalized = true;
> +       } else {
> +               char *normalized = strdup(normalize_arch(env->arch));
> +
> +               free(env->arch);
> +               env->arch = normalized;
> +               env->arch_name_normalized = true;
> +       }
> +       return env->arch;
> }
> 
> #if defined(HAVE_LIBTRACEEVENT)
> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
> index 9977b85523a8..da20bc3810bd 100644
> --- a/tools/perf/util/env.h
> +++ b/tools/perf/util/env.h
> @@ -61,6 +61,7 @@ struct perf_env {
>        char                    *os_release;
>        char                    *version;
>        char                    *arch;
> +       bool                    arch_name_normalized;
>        int                     nr_cpus_online;
>        int                     nr_cpus_avail;
>        char                    *cpu_desc;
> ```
> or just force the normalize before assignment.
> 
> For reader__read_event and the perf_env these changes may be relevant:
> "perf sample: Remove arch notion of sample parsing"
> https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
> "perf env: Remove global perf_env"
> https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
> 
> I wonder if for some reason the env is NULL in your cases, so we go
> slow path a lot.
> 
> The histogram stuff, Namhyung may know more.
> 
> Thanks!
> Ian
> 

Thanks Ian,

I tested your patch, but there is no big difference:
root in 🌐 s83lp47 in ~ 
❯ ll ~/perf.data-test
-rw-------. 1 root root 152280778 Oct 30 07:58 /root/perf.data-test

here is the output with your patch:
root in 🌐 s83lp47 in ~ 
❯ time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null

real	46m26.350s
user	46m17.559s
sys	0m7.434s


Here is the output from the linux-next perf version:

root in 🌐 s83lp47 in ~ took 46m26s 
❯ time perf report -D -i ~/perf.data-test > /dev/null

real	49m14.364s
user	49m4.872s
sys	0m8.037s

root in 🌐 s83lp47 in ~ took 49m14s 

When I go back history and use:

# pwd
/root/mirror-linux-next/tools/perf
# git branch
* (HEAD detached at v6.10)
  master
]# ll ~/perf.data-test
-rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-test
# time ./perf report -D -i ~/perf.data-test > /dev/null

real	8m49.089s
user	8m47.898s
sys	0m0.848s
# 

Thanks for the pointer. Since the issue is with s390 specific code
I will do a git bisect and we go from there.

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-30  8:59     ` Thomas Richter
@ 2025-10-30 13:25       ` Thomas Richter
  2025-10-30 14:27         ` Thomas Richter
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Richter @ 2025-10-30 13:25 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On 10/30/25 09:59, Thomas Richter wrote:
> On 10/29/25 17:47, Ian Rogers wrote:
>> On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>>
>>> On 10/29/25 09:03, Thomas Richter wrote:
>>>> Hi all,
>>>>
>>>> has something changed recently in perf report in linux-next repo?
>>>>
>>>> I run perf report -i <file> -D | grep something
>>>> on s390 with 2 different kernels:
>>>>
>>>> master 6.18.0rc3:
>>>> # uname -a
>>>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
>>>> # ll /tmp/perf2-eSso2q-2
>>>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
>>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>>
>>>> real  0m22.262s
>>>> user  0m22.068s
>>>> sys   0m0.200s
>>>> #
>>>>
>>>> This runtime is ok for the perf.data file size.
>>>>
>>>> linux-next:
>>>> # uname -a
>>>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
>>>> #ll /tmp/perf2-eSso2q-2
>>>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
>>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>>
>>>> real  2m53.601s
>>>> user  2m52.938s
>>>> sys   0m0.575s
>>>> #
>>>>
>>>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
>>>> for the same input!
>>>>
>>>> Has anybody similar observations?
>>>>
>>>> Thanks a lot
>>>>
>>>
>>> Ian,
>>>
>>> sorry sent button pushed too early...
>>>
>>> here is a second try without option -D.
>>>
>>> This is much better:
>>>
>>> root in 🌐 s83lp47 in ~
>>> ❯ ll perf1.data
>>> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
>>>
>>> root in 🌐 s83lp47 in ~
>>> ❯ time perf record -- perf report -i perf1.data  > /dev/null
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
>>>
>>> real    0m1.097s
>>> user    0m0.082s
>>> sys     0m0.167s
>>>
>>> root in 🌐 s83lp47 in ~
>>> ❯ perf report --stdio
>>> # To display the perf.data header info, please use --header/--header-only options.
>>> #
>>> #
>>> # Total Lost Samples: 0
>>> #
>>> # Samples: 246  of event 'cpum_cf/cycles/P'
>>> # Event count (approx.): 307500000
>>> #
>>> # Overhead  Command  Shared Object        Symbol
>>> # ........  .......  ...................  ..................................................
>>> #
>>>      9.35%  perf     perf                 [.] reader__read_event
>>>      4.47%  perf     perf                 [.] perf_env__arch
>>>      2.85%  perf     ld64.so.1            [.] _dl_relocate_object_no_relro
>>>      2.44%  perf     ld64.so.1            [.] _dl_lookup_symbol_x
>>>      2.44%  perf     ld64.so.1            [.] do_lookup_x
>>>      2.44%  perf     perf                 [.] down_read
>>>      2.03%  perf     perf                 [.] hists__findnew_entry
>>>      2.03%  perf     perf                 [.] perf_hpp__is_dynamic_entry
>>>      1.63%  perf     [kernel.kallsyms]    [k] __s390_indirect_jump_r14
>>>      1.63%  perf     [kernel.kallsyms]    [k] format_decode
>>>      1.63%  perf     libc.so.6            [.] cfree@GLIBC_2.2
>>>      1.63%  perf     perf                 [.] evsel__parse_sample
>>>      1.63%  perf     perf                 [.] map__put
>>>      1.63%  perf     perf                 [.] sort__comm_cmp
>>>      1.63%  perf     perf                 [.] thread__put
>>>      1.22%  perf     [kernel.kallsyms]    [k] folio_add_file_rmap_ptes
>>>      1.22%  perf     [kernel.kallsyms]    [k] get_page_from_freelist
>>>      1.22%  perf     ld64.so.1            [.] _dl_name_match_p
>>>      1.22%  perf     perf                 [.] dump_printf
>>>      1.22%  perf     perf                 [.] evlist__event2evsel
>>>      1.22%  perf     perf                 [.] hashmap_find
>>>      1.22%  perf     perf                 [.] perf_session__deliver_event
>>
>> Hmm.. for perf_env__arch we could probably cache better:
>> ```
>> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
>> index f1626d2032cd..2591f71c7c42 100644
>> --- a/tools/perf/util/env.c
>> +++ b/tools/perf/util/env.c
>> @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch)
>>
>> const char *perf_env__arch(struct perf_env *env)
>> {
>> -       char *arch_name;
>> +       if (env && env->arch && env->arch_name_normalized)
>> +               return env->arch;
>>
>>        if (!env || !env->arch) { /* Assume local operation */
>>                static struct utsname uts = { .machine[0] = '\0', };
>>                if (uts.machine[0] == '\0' && uname(&uts) < 0)
>>                        return NULL;
>> -               arch_name = uts.machine;
>> -       } else
>> -               arch_name = env->arch;
>>
>> -       return normalize_arch(arch_name);
>> +               if (!env)
>> +                       return normalize_arch(uts.machine);
>> +
>> +               env->arch = strdup(normalize_arch(uts.machine));
>> +               env->arch_name_normalized = true;
>> +       } else {
>> +               char *normalized = strdup(normalize_arch(env->arch));
>> +
>> +               free(env->arch);
>> +               env->arch = normalized;
>> +               env->arch_name_normalized = true;
>> +       }
>> +       return env->arch;
>> }
>>
>> #if defined(HAVE_LIBTRACEEVENT)
>> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
>> index 9977b85523a8..da20bc3810bd 100644
>> --- a/tools/perf/util/env.h
>> +++ b/tools/perf/util/env.h
>> @@ -61,6 +61,7 @@ struct perf_env {
>>        char                    *os_release;
>>        char                    *version;
>>        char                    *arch;
>> +       bool                    arch_name_normalized;
>>        int                     nr_cpus_online;
>>        int                     nr_cpus_avail;
>>        char                    *cpu_desc;
>> ```
>> or just force the normalize before assignment.
>>
>> For reader__read_event and the perf_env these changes may be relevant:
>> "perf sample: Remove arch notion of sample parsing"
>> https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
>> "perf env: Remove global perf_env"
>> https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
>>
>> I wonder if for some reason the env is NULL in your cases, so we go
>> slow path a lot.
>>
>> The histogram stuff, Namhyung may know more.
>>
>> Thanks!
>> Ian
>>
> 
> Thanks Ian,
> 
> I tested your patch, but there is no big difference:
> root in 🌐 s83lp47 in ~ 
> ❯ ll ~/perf.data-test
> -rw-------. 1 root root 152280778 Oct 30 07:58 /root/perf.data-test
> 
> here is the output with your patch:
> root in 🌐 s83lp47 in ~ 
> ❯ time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null
> 
> real	46m26.350s
> user	46m17.559s
> sys	0m7.434s
> 
> 
> Here is the output from the linux-next perf version:
> 
> root in 🌐 s83lp47 in ~ took 46m26s 
> ❯ time perf report -D -i ~/perf.data-test > /dev/null
> 
> real	49m14.364s
> user	49m4.872s
> sys	0m8.037s
> 
> root in 🌐 s83lp47 in ~ took 49m14s 
> 
> When I go back history and use:
> 
> # pwd
> /root/mirror-linux-next/tools/perf
> # git branch
> * (HEAD detached at v6.10)
>   master
> ]# ll ~/perf.data-test
> -rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-test
> # time ./perf report -D -i ~/perf.data-test > /dev/null
> 
> real	8m49.089s
> user	8m47.898s
> sys	0m0.848s
> # 
> 
> Thanks for the pointer. Since the issue is with s390 specific code
> I will do a git bisect and we go from there.
> 

This is the first bad commit.... (I actually tested it :-(, obviously without -D option )

commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
Author: Ian Rogers <irogers@google.com>
Date:   Sun Oct 5 11:24:16 2025 -0700

    perf jevents: Add legacy-hardware and legacy-cache json
    
    The legacy-hardware.json is added containing hardware events similarly
    to the software.json file. A difference is that for the software PMU
    the name is known and matches sysfs. In the legacy-hardware.json no
    Unit/PMU is specified for the events meaning default_core is used and
    the events will appear for all core PMUs.
    ......

What puzzles me is that the above patch does not change anything in
util/s390-sample-raw.c which handles s390 specific sample interpretation.

Maybe that patch affects perf_pmus__find()?
Not sure how to proceed.

Thanks a lot
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-30 13:25       ` Thomas Richter
@ 2025-10-30 14:27         ` Thomas Richter
  2025-10-30 15:57           ` Ian Rogers
  0 siblings, 1 reply; 14+ messages in thread
From: Thomas Richter @ 2025-10-30 14:27 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On 10/30/25 14:25, Thomas Richter wrote:
> On 10/30/25 09:59, Thomas Richter wrote:
>> On 10/29/25 17:47, Ian Rogers wrote:
>>> On Wed, Oct 29, 2025 at 9:12 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>>>
>>>> On 10/29/25 09:03, Thomas Richter wrote:
>>>>> Hi all,
>>>>>
>>>>> has something changed recently in perf report in linux-next repo?
>>>>>
>>>>> I run perf report -i <file> -D | grep something
>>>>> on s390 with 2 different kernels:
>>>>>
>>>>> master 6.18.0rc3:
>>>>> # uname -a
>>>>> Linux a35lp67.lnxne.boe 6.18.0-rc3m-perf+ #23 SMP Tue Oct 28 15:53:44 CET 2025 s390x GNU/Linux
>>>>> # ll /tmp/perf2-eSso2q-2
>>>>> -rw------- 1 root root 8538184 Oct 29 08:44 /tmp/perf2-eSso2q-2
>>>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>>>
>>>>> real  0m22.262s
>>>>> user  0m22.068s
>>>>> sys   0m0.200s
>>>>> #
>>>>>
>>>>> This runtime is ok for the perf.data file size.
>>>>>
>>>>> linux-next:
>>>>> # uname -a
>>>>> Linux s83lp47.lnxne.boe 6.18.0-20251028.rc3.git141.f7d2388eeec2.63.fc42.s390x+next #1 SMP Tue Oct 28 20:06:13 CET 2025 s390x GNU/Linux
>>>>> #ll /tmp/perf2-eSso2q-2
>>>>> -rw-------. 1 root root 8538184 Oct 29 08:47 /tmp/perf2-eSso2q-2
>>>>> # time perf report -i/tmp/perf2-eSso2q-2 -D | grep Counter:000 > /dev/null
>>>>>
>>>>> real  2m53.601s
>>>>> user  2m52.938s
>>>>> sys   0m0.575s
>>>>> #
>>>>>
>>>>> Same perf.data file and run time increased from 22 seconds to nearly 300 seconds, roughly 14 times longer
>>>>> for the same input!
>>>>>
>>>>> Has anybody similar observations?
>>>>>
>>>>> Thanks a lot
>>>>>
>>>>
>>>> Ian,
>>>>
>>>> sorry sent button pushed too early...
>>>>
>>>> here is a second try without option -D.
>>>>
>>>> This is much better:
>>>>
>>>> root in 🌐 s83lp47 in ~
>>>> ❯ ll perf1.data
>>>> -rw-------. 1 root root 99692290 Oct 29 16:17 perf1.data
>>>>
>>>> root in 🌐 s83lp47 in ~
>>>> ❯ time perf record -- perf report -i perf1.data  > /dev/null
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.027 MB perf.data (246 samples) ]
>>>>
>>>> real    0m1.097s
>>>> user    0m0.082s
>>>> sys     0m0.167s
>>>>
>>>> root in 🌐 s83lp47 in ~
>>>> ❯ perf report --stdio
>>>> # To display the perf.data header info, please use --header/--header-only options.
>>>> #
>>>> #
>>>> # Total Lost Samples: 0
>>>> #
>>>> # Samples: 246  of event 'cpum_cf/cycles/P'
>>>> # Event count (approx.): 307500000
>>>> #
>>>> # Overhead  Command  Shared Object        Symbol
>>>> # ........  .......  ...................  ..................................................
>>>> #
>>>>      9.35%  perf     perf                 [.] reader__read_event
>>>>      4.47%  perf     perf                 [.] perf_env__arch
>>>>      2.85%  perf     ld64.so.1            [.] _dl_relocate_object_no_relro
>>>>      2.44%  perf     ld64.so.1            [.] _dl_lookup_symbol_x
>>>>      2.44%  perf     ld64.so.1            [.] do_lookup_x
>>>>      2.44%  perf     perf                 [.] down_read
>>>>      2.03%  perf     perf                 [.] hists__findnew_entry
>>>>      2.03%  perf     perf                 [.] perf_hpp__is_dynamic_entry
>>>>      1.63%  perf     [kernel.kallsyms]    [k] __s390_indirect_jump_r14
>>>>      1.63%  perf     [kernel.kallsyms]    [k] format_decode
>>>>      1.63%  perf     libc.so.6            [.] cfree@GLIBC_2.2
>>>>      1.63%  perf     perf                 [.] evsel__parse_sample
>>>>      1.63%  perf     perf                 [.] map__put
>>>>      1.63%  perf     perf                 [.] sort__comm_cmp
>>>>      1.63%  perf     perf                 [.] thread__put
>>>>      1.22%  perf     [kernel.kallsyms]    [k] folio_add_file_rmap_ptes
>>>>      1.22%  perf     [kernel.kallsyms]    [k] get_page_from_freelist
>>>>      1.22%  perf     ld64.so.1            [.] _dl_name_match_p
>>>>      1.22%  perf     perf                 [.] dump_printf
>>>>      1.22%  perf     perf                 [.] evlist__event2evsel
>>>>      1.22%  perf     perf                 [.] hashmap_find
>>>>      1.22%  perf     perf                 [.] perf_session__deliver_event
>>>
>>> Hmm.. for perf_env__arch we could probably cache better:
>>> ```
>>> diff --git a/tools/perf/util/env.c b/tools/perf/util/env.c
>>> index f1626d2032cd..2591f71c7c42 100644
>>> --- a/tools/perf/util/env.c
>>> +++ b/tools/perf/util/env.c
>>> @@ -593,17 +593,27 @@ static const char *normalize_arch(char *arch)
>>>
>>> const char *perf_env__arch(struct perf_env *env)
>>> {
>>> -       char *arch_name;
>>> +       if (env && env->arch && env->arch_name_normalized)
>>> +               return env->arch;
>>>
>>>        if (!env || !env->arch) { /* Assume local operation */
>>>                static struct utsname uts = { .machine[0] = '\0', };
>>>                if (uts.machine[0] == '\0' && uname(&uts) < 0)
>>>                        return NULL;
>>> -               arch_name = uts.machine;
>>> -       } else
>>> -               arch_name = env->arch;
>>>
>>> -       return normalize_arch(arch_name);
>>> +               if (!env)
>>> +                       return normalize_arch(uts.machine);
>>> +
>>> +               env->arch = strdup(normalize_arch(uts.machine));
>>> +               env->arch_name_normalized = true;
>>> +       } else {
>>> +               char *normalized = strdup(normalize_arch(env->arch));
>>> +
>>> +               free(env->arch);
>>> +               env->arch = normalized;
>>> +               env->arch_name_normalized = true;
>>> +       }
>>> +       return env->arch;
>>> }
>>>
>>> #if defined(HAVE_LIBTRACEEVENT)
>>> diff --git a/tools/perf/util/env.h b/tools/perf/util/env.h
>>> index 9977b85523a8..da20bc3810bd 100644
>>> --- a/tools/perf/util/env.h
>>> +++ b/tools/perf/util/env.h
>>> @@ -61,6 +61,7 @@ struct perf_env {
>>>        char                    *os_release;
>>>        char                    *version;
>>>        char                    *arch;
>>> +       bool                    arch_name_normalized;
>>>        int                     nr_cpus_online;
>>>        int                     nr_cpus_avail;
>>>        char                    *cpu_desc;
>>> ```
>>> or just force the normalize before assignment.
>>>
>>> For reader__read_event and the perf_env these changes may be relevant:
>>> "perf sample: Remove arch notion of sample parsing"
>>> https://lore.kernel.org/r/20250724163302.596743-21-irogers@google.com
>>> "perf env: Remove global perf_env"
>>> https://lore.kernel.org/r/20250724163302.596743-20-irogers@google.com
>>>
>>> I wonder if for some reason the env is NULL in your cases, so we go
>>> slow path a lot.
>>>
>>> The histogram stuff, Namhyung may know more.
>>>
>>> Thanks!
>>> Ian
>>>
>>
>> Thanks Ian,
>>
>> I tested your patch, but there is no big difference:
>> root in 🌐 s83lp47 in ~ 
>> ❯ ll ~/perf.data-test
>> -rw-------. 1 root root 152280778 Oct 30 07:58 /root/perf.data-test
>>
>> here is the output with your patch:
>> root in 🌐 s83lp47 in ~ 
>> ❯ time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null
>>
>> real	46m26.350s
>> user	46m17.559s
>> sys	0m7.434s
>>
>>
>> Here is the output from the linux-next perf version:
>>
>> root in 🌐 s83lp47 in ~ took 46m26s 
>> ❯ time perf report -D -i ~/perf.data-test > /dev/null
>>
>> real	49m14.364s
>> user	49m4.872s
>> sys	0m8.037s
>>
>> root in 🌐 s83lp47 in ~ took 49m14s 
>>
>> When I go back history and use:
>>
>> # pwd
>> /root/mirror-linux-next/tools/perf
>> # git branch
>> * (HEAD detached at v6.10)
>>   master
>> ]# ll ~/perf.data-test
>> -rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-test
>> # time ./perf report -D -i ~/perf.data-test > /dev/null
>>
>> real	8m49.089s
>> user	8m47.898s
>> sys	0m0.848s
>> # 
>>
>> Thanks for the pointer. Since the issue is with s390 specific code
>> I will do a git bisect and we go from there.
>>
> 
> This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
> 
> commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
> Author: Ian Rogers <irogers@google.com>
> Date:   Sun Oct 5 11:24:16 2025 -0700
> 
>     perf jevents: Add legacy-hardware and legacy-cache json
>     
>     The legacy-hardware.json is added containing hardware events similarly
>     to the software.json file. A difference is that for the software PMU
>     the name is known and matches sysfs. In the legacy-hardware.json no
>     Unit/PMU is specified for the events meaning default_core is used and
>     the events will appear for all core PMUs.
>     ......
> 
> What puzzles me is that the above patch does not change anything in
> util/s390-sample-raw.c which handles s390 specific sample interpretation.
> 
> Maybe that patch affects perf_pmus__find()?
> Not sure how to proceed.
> 
> Thanks a lot

Ian,

the issue is within function perf_pmu__for_each_event().
Without calling this function the output is as fast as before.
(Well a bit faster, because I replaced this call by strdup("hardcoded-name");)

I have looked at this code, but I did not understand a thing....
Strange, perf_pmu__for_each_event() did not change in this commit,
but the behavior changed a lot.

What we need for s390 is some way of faster event name lookup,
the PMU is known, its "cpum_cf". Is there no easy way to
find the events on this PMU (given the event number).

Maybe a some sort of table, because this lookup is done
millions of times on a large file with samples including raw data

Ideas?
Thanks
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-30 14:27         ` Thomas Richter
@ 2025-10-30 15:57           ` Ian Rogers
  2025-10-30 17:14             ` Ian Rogers
  2025-10-31  7:06             ` Thomas Richter
  0 siblings, 2 replies; 14+ messages in thread
From: Ian Rogers @ 2025-10-30 15:57 UTC (permalink / raw)
  To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>
> On 10/30/25 14:25, Thomas Richter wrote:
> > On 10/30/25 09:59, Thomas Richter wrote:
> > This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
> >
> > commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
> > Author: Ian Rogers <irogers@google.com>
> > Date:   Sun Oct 5 11:24:16 2025 -0700
> >
> >     perf jevents: Add legacy-hardware and legacy-cache json
> >
> >     The legacy-hardware.json is added containing hardware events similarly
> >     to the software.json file. A difference is that for the software PMU
> >     the name is known and matches sysfs. In the legacy-hardware.json no
> >     Unit/PMU is specified for the events meaning default_core is used and
> >     the events will appear for all core PMUs.
> >     ......
> >
> > What puzzles me is that the above patch does not change anything in
> > util/s390-sample-raw.c which handles s390 specific sample interpretation.
> >
> > Maybe that patch affects perf_pmus__find()?
> > Not sure how to proceed.
> >
> > Thanks a lot
>
> Ian,
>
> the issue is within function perf_pmu__for_each_event().
> Without calling this function the output is as fast as before.
> (Well a bit faster, because I replaced this call by strdup("hardcoded-name");)
>
> I have looked at this code, but I did not understand a thing....
> Strange, perf_pmu__for_each_event() did not change in this commit,
> but the behavior changed a lot.
>
> What we need for s390 is some way of faster event name lookup,
> the PMU is known, its "cpum_cf". Is there no easy way to
> find the events on this PMU (given the event number).
>
> Maybe a some sort of table, because this lookup is done
> millions of times on a large file with samples including raw data
>
> Ideas?

Thanks Thomas!

The bisect and the behavior make sense now. Previously the legacy
events weren't part of perf_pmu__for_each_event and now with the
legacy-hardware.json they are. The events were parsed by matching
explicitly in the parse-events.l lex parser, but for lots of reasons
the json is better. Anyway, there is common functionality for
reversing a perf_event_attr.config to an event name in
perf_pmu__name_from_config:
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728

The normal case for a PMU is that we want to see if it has an event by
name, so we have a hashmap from name to the "alias" information (I
don't really like the name "alias", it is information about an event
we want to encode):
https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58

The use-case for perf_pmu__name_from_config is generally debug output,
so there isn't a fast path config to name lookup. Is config to name
lookup sufficient in s390-sample-raw.c? I notice that the code is
matching the "event=" part of the encoding, which is a sub-part of the
config (and may actually be a value in config2 or config3).

I wonder if it would be easier to add a cache of matched names in
s390-sample-raw.c? So a global/static hashmap in that C file with a
key of set and nr as per get_counter_name (it seems the PMU can be
implied to always only be the core PMU) and a value of a strdup-ed
counter name.

Let me know what you think and I can send you a patch if you like.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-30 15:57           ` Ian Rogers
@ 2025-10-30 17:14             ` Ian Rogers
  2025-10-31  8:20               ` Thomas Richter
  2025-10-31  7:06             ` Thomas Richter
  1 sibling, 1 reply; 14+ messages in thread
From: Ian Rogers @ 2025-10-30 17:14 UTC (permalink / raw)
  To: Thomas Richter; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On Thu, Oct 30, 2025 at 8:57 AM Ian Rogers <irogers@google.com> wrote:
>
> On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
> >
> > On 10/30/25 14:25, Thomas Richter wrote:
> > > On 10/30/25 09:59, Thomas Richter wrote:
> > > This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
> > >
> > > commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
> > > Author: Ian Rogers <irogers@google.com>
> > > Date:   Sun Oct 5 11:24:16 2025 -0700
> > >
> > >     perf jevents: Add legacy-hardware and legacy-cache json
> > >
> > >     The legacy-hardware.json is added containing hardware events similarly
> > >     to the software.json file. A difference is that for the software PMU
> > >     the name is known and matches sysfs. In the legacy-hardware.json no
> > >     Unit/PMU is specified for the events meaning default_core is used and
> > >     the events will appear for all core PMUs.
> > >     ......
> > >
> > > What puzzles me is that the above patch does not change anything in
> > > util/s390-sample-raw.c which handles s390 specific sample interpretation.
> > >
> > > Maybe that patch affects perf_pmus__find()?
> > > Not sure how to proceed.
> > >
> > > Thanks a lot
> >
> > Ian,
> >
> > the issue is within function perf_pmu__for_each_event().
> > Without calling this function the output is as fast as before.
> > (Well a bit faster, because I replaced this call by strdup("hardcoded-name");)
> >
> > I have looked at this code, but I did not understand a thing....
> > Strange, perf_pmu__for_each_event() did not change in this commit,
> > but the behavior changed a lot.
> >
> > What we need for s390 is some way of faster event name lookup,
> > the PMU is known, its "cpum_cf". Is there no easy way to
> > find the events on this PMU (given the event number).
> >
> > Maybe a some sort of table, because this lookup is done
> > millions of times on a large file with samples including raw data
> >
> > Ideas?
>
> Thanks Thomas!
>
> The bisect and the behavior make sense now. Previously the legacy
> events weren't part of perf_pmu__for_each_event and now with the
> legacy-hardware.json they are. The events were parsed by matching
> explicitly in the parse-events.l lex parser, but for lots of reasons
> the json is better. Anyway, there is common functionality for
> reversing a perf_event_attr.config to an event name in
> perf_pmu__name_from_config:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728
>
> The normal case for a PMU is that we want to see if it has an event by
> name, so we have a hashmap from name to the "alias" information (I
> don't really like the name "alias", it is information about an event
> we want to encode):
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58
>
> The use-case for perf_pmu__name_from_config is generally debug output,
> so there isn't a fast path config to name lookup. Is config to name
> lookup sufficient in s390-sample-raw.c? I notice that the code is
> matching the "event=" part of the encoding, which is a sub-part of the
> config (and may actually be a value in config2 or config3).
>
> I wonder if it would be easier to add a cache of matched names in
> s390-sample-raw.c? So a global/static hashmap in that C file with a
> key of set and nr as per get_counter_name (it seems the PMU can be
> implied to always only be the core PMU) and a value of a strdup-ed
> counter name.
>
> Let me know what you think and I can send you a patch if you like.

So:
https://lore.kernel.org/lkml/20251030171211.1109480-1-irogers@google.com/
should cover it. Let me know what you think.

Thanks,
Ian

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-30 15:57           ` Ian Rogers
  2025-10-30 17:14             ` Ian Rogers
@ 2025-10-31  7:06             ` Thomas Richter
  1 sibling, 0 replies; 14+ messages in thread
From: Thomas Richter @ 2025-10-31  7:06 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On 10/30/25 16:57, Ian Rogers wrote:
> On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>
>> On 10/30/25 14:25, Thomas Richter wrote:
>>> On 10/30/25 09:59, Thomas Richter wrote:
>>> This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
>>>
>>> commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
>>> Author: Ian Rogers <irogers@google.com>
>>> Date:   Sun Oct 5 11:24:16 2025 -0700
>>>
>>>     perf jevents: Add legacy-hardware and legacy-cache json
>>>
>>>     The legacy-hardware.json is added containing hardware events similarly
>>>     to the software.json file. A difference is that for the software PMU
>>>     the name is known and matches sysfs. In the legacy-hardware.json no
>>>     Unit/PMU is specified for the events meaning default_core is used and
>>>     the events will appear for all core PMUs.
>>>     ......
>>>
>>> What puzzles me is that the above patch does not change anything in
>>> util/s390-sample-raw.c which handles s390 specific sample interpretation.
>>>
>>> Maybe that patch affects perf_pmus__find()?
>>> Not sure how to proceed.
>>>
>>> Thanks a lot
>>
>> Ian,
>>
>> the issue is within function perf_pmu__for_each_event().
>> Without calling this function the output is as fast as before.
>> (Well a bit faster, because I replaced this call by strdup("hardcoded-name");)
>>
>> I have looked at this code, but I did not understand a thing....
>> Strange, perf_pmu__for_each_event() did not change in this commit,
>> but the behavior changed a lot.
>>
>> What we need for s390 is some way of faster event name lookup,
>> the PMU is known, its "cpum_cf". Is there no easy way to
>> find the events on this PMU (given the event number).
>>
>> Maybe a some sort of table, because this lookup is done
>> millions of times on a large file with samples including raw data
>>
>> Ideas?
> 
> Thanks Thomas!
> 
> The bisect and the behavior make sense now. Previously the legacy
> events weren't part of perf_pmu__for_each_event and now with the
> legacy-hardware.json they are. The events were parsed by matching
> explicitly in the parse-events.l lex parser, but for lots of reasons
> the json is better. Anyway, there is common functionality for
> reversing a perf_event_attr.config to an event name in
> perf_pmu__name_from_config:
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728
> 
> The normal case for a PMU is that we want to see if it has an event by
> name, so we have a hashmap from name to the "alias" information (I
> don't really like the name "alias", it is information about an event
> we want to encode):
> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58
> 
> The use-case for perf_pmu__name_from_config is generally debug output,
> so there isn't a fast path config to name lookup. Is config to name
> lookup sufficient in s390-sample-raw.c? I notice that the code is
> matching the "event=" part of the encoding, which is a sub-part of the
> config (and may actually be a value in config2 or config3).
> 
> I wonder if it would be easier to add a cache of matched names in
> s390-sample-raw.c? So a global/static hashmap in that C file with a
> key of set and nr as per get_counter_name (it seems the PMU can be
> implied to always only be the core PMU) and a value of a strdup-ed
> counter name.


I do confirm your assessment, here is the proof. A very poor man's caching
in util/s390-sample-raw.c:

# git diff
diff --git a/tools/perf/util/s390-sample-raw.c b/tools/perf/util/s390-sample-raw.c
index 335217bb532b..b04d3b1a8750 100644
--- a/tools/perf/util/s390-sample-raw.c
+++ b/tools/perf/util/s390-sample-raw.c
@@ -162,6 +162,7 @@ static int get_counter_name_callback(void *vdata, struct pmu_event_info *info)
  * number and use this as key. If they match return the name of this counter.
  * If no match is found a NULL pointer is returned.
  */
+static char *ctrname[500];
 static char *get_counter_name(int set, int nr, struct perf_pmu *pmu)
 {
        struct get_counter_name_data data = {
@@ -171,9 +172,14 @@ static char *get_counter_name(int set, int nr, struct perf_pmu *pmu)
 
        if (!pmu)
                return NULL;
+if (ctrname[data.wanted]) return strdup(ctrname[data.wanted]);
 
        perf_pmu__for_each_event(pmu, /*skip_duplicate_pmus=*/ true,
                                 &data, get_counter_name_callback);
+if(data.result)
+    ctrname[data.wanted] = strdup(data.result);
+else
+    ctrname[data.wanted] = strdup("unknown");
        return data.result;
 }
 
# cd
# time mirror-linux-next/tools/perf/perf report -D -i ~/perf.data-test > /dev/null

real	0m25.908s
user	0m25.548s
sys	0m0.182s
# 

> 
> Let me know what you think and I can send you a patch if you like.
> 
> Thanks,
> Ian

Of course!!
A hash list is more flexible, but an array is fine too. There are very few holes
in the counter set numbers, They are listed as "unknown". That does not mean these
counter do not exist! Most of them have values, but their name was not made public
in the documentation SA23-2261-09.

And there are more counters from the PAI crypto (172) and PAI NNPA PMUs (36), not much
but in a different range. So I guess this qualifies for the hash solution.

Thanks a lot
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [linux-next] perf report runtime extremely long
  2025-10-30 17:14             ` Ian Rogers
@ 2025-10-31  8:20               ` Thomas Richter
  0 siblings, 0 replies; 14+ messages in thread
From: Thomas Richter @ 2025-10-31  8:20 UTC (permalink / raw)
  To: Ian Rogers; +Cc: linux-perf-use., Jan Polensky, Alexander Gordeev

On 10/30/25 18:14, Ian Rogers wrote:
> On Thu, Oct 30, 2025 at 8:57 AM Ian Rogers <irogers@google.com> wrote:
>>
>> On Thu, Oct 30, 2025 at 7:27 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>>
>>> On 10/30/25 14:25, Thomas Richter wrote:
>>>> On 10/30/25 09:59, Thomas Richter wrote:
>>>> This is the first bad commit.... (I actually tested it :-(, obviously without -D option )
>>>>
>>>> commit 0012e0fa221bf9cc54aa6a0e94d848653dc781bb (HEAD)
>>>> Author: Ian Rogers <irogers@google.com>
>>>> Date:   Sun Oct 5 11:24:16 2025 -0700
>>>>
>>>>     perf jevents: Add legacy-hardware and legacy-cache json
>>>>
>>>>     The legacy-hardware.json is added containing hardware events similarly
>>>>     to the software.json file. A difference is that for the software PMU
>>>>     the name is known and matches sysfs. In the legacy-hardware.json no
>>>>     Unit/PMU is specified for the events meaning default_core is used and
>>>>     the events will appear for all core PMUs.
>>>>     ......
>>>>
>>>> What puzzles me is that the above patch does not change anything in
>>>> util/s390-sample-raw.c which handles s390 specific sample interpretation.
>>>>
>>>> Maybe that patch affects perf_pmus__find()?
>>>> Not sure how to proceed.
>>>>
>>>> Thanks a lot
>>>
>>> Ian,
>>>
>>> the issue is within function perf_pmu__for_each_event().
>>> Without calling this function the output is as fast as before.
>>> (Well a bit faster, because I replaced this call by strdup("hardcoded-name");)
>>>
>>> I have looked at this code, but I did not understand a thing....
>>> Strange, perf_pmu__for_each_event() did not change in this commit,
>>> but the behavior changed a lot.
>>>
>>> What we need for s390 is some way of faster event name lookup,
>>> the PMU is known, its "cpum_cf". Is there no easy way to
>>> find the events on this PMU (given the event number).
>>>
>>> Maybe a some sort of table, because this lookup is done
>>> millions of times on a large file with samples including raw data
>>>
>>> Ideas?
>>
>> Thanks Thomas!
>>
>> The bisect and the behavior make sense now. Previously the legacy
>> events weren't part of perf_pmu__for_each_event and now with the
>> legacy-hardware.json they are. The events were parsed by matching
>> explicitly in the parse-events.l lex parser, but for lots of reasons
>> the json is better. Anyway, there is common functionality for
>> reversing a perf_event_attr.config to an event name in
>> perf_pmu__name_from_config:
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.h?h=perf-tools-next#n306
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n2728
>>
>> The normal case for a PMU is that we want to see if it has an event by
>> name, so we have a hashmap from name to the "alias" information (I
>> don't really like the name "alias", it is information about an event
>> we want to encode):
>> https://web.git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git/tree/tools/perf/util/pmu.c?h=perf-tools-next#n58
>>
>> The use-case for perf_pmu__name_from_config is generally debug output,
>> so there isn't a fast path config to name lookup. Is config to name
>> lookup sufficient in s390-sample-raw.c? I notice that the code is
>> matching the "event=" part of the encoding, which is a sub-part of the
>> config (and may actually be a value in config2 or config3).
>>
>> I wonder if it would be easier to add a cache of matched names in
>> s390-sample-raw.c? So a global/static hashmap in that C file with a
>> key of set and nr as per get_counter_name (it seems the PMU can be
>> implied to always only be the core PMU) and a value of a strdup-ed
>> counter name.
>>
>> Let me know what you think and I can send you a patch if you like.
> 
> So:
> https://lore.kernel.org/lkml/20251030171211.1109480-1-irogers@google.com/
> should cover it. Let me know what you think.
> 
> Thanks,
> Ian

Great work, I downloaded and tested your patch

# time perf report -D -i ~/perf.data-test > /dev/null

real	9m16.125s
user	9m14.838s
sys	0m0.912s
[root@b83lp65 perf]# ll ~/perf.data-test
-rw------- 1 root root 152280778 Oct 30 08:38 /root/perf.data-390raw
# 

This is back to what is was before. Much better than the ~50 minutes.
If there is need for further improvement, we can go for an array solution.

Please submit this patch to the mailing list.

Thank you very much for your help.

Tested-by: Thomas Richter <tmricht@linux.ibm.com>

-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH

Vorsitzender des Aufsichtsrats: Wolfgang Wendt

Geschäftsführung: David Faller

Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2025-10-31  8:20 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-10-29  8:03 [linux-next] perf report runtime extremely long Thomas Richter
2025-10-29 15:13 ` Ian Rogers
2025-10-29 15:52   ` Thomas Richter
2025-10-29 16:25     ` Ian Rogers
2025-10-29 15:56 ` Thomas Richter
2025-10-29 16:47   ` Ian Rogers
2025-10-30  6:04     ` Namhyung Kim
2025-10-30  8:59     ` Thomas Richter
2025-10-30 13:25       ` Thomas Richter
2025-10-30 14:27         ` Thomas Richter
2025-10-30 15:57           ` Ian Rogers
2025-10-30 17:14             ` Ian Rogers
2025-10-31  8:20               ` Thomas Richter
2025-10-31  7:06             ` Thomas Richter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).