linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Thomas Richter <tmricht@linux.ibm.com>
To: Ian Rogers <irogers@google.com>
Cc: "linux-perf-use." <linux-perf-users@vger.kernel.org>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Sumanth Korikkar <sumanthk@linux.ibm.com>
Subject: Re: perf test failures in linux-next on s390
Date: Wed, 14 Jun 2023 10:31:55 +0200	[thread overview]
Message-ID: <e7f0930a-a9fc-d768-a472-bd9af6fafdf5@linux.ibm.com> (raw)
In-Reply-To: <CAP-5=fU+0VXckQiq3E8yqaySNZ+-DDZahEd1OY0uKPWnfFsafg@mail.gmail.com>

On 6/13/23 16:32, Ian Rogers wrote:
> On Tue, Jun 13, 2023 at 5:54 AM Thomas Richter <tmricht@linux.ibm.com> wrote:
>>
>> Hi all,
>>
>> I have run the perf test suite on the current 6.4rc6 kernel and see just one error:
>> # ./perf test 2>&1 | fgrep FAILED
>> fgrep: warning: fgrep is obsolescent; using grep -F
>>  42.3: BPF prologue generation                                       : FAILED!
>> #
>>
>> However when I download the linux-next tree and build kernel and perf
>> tool with the same kernel config file, I get a bunch of failing test cases,
>> many with perf tool dumping core:
>>
>> # perf test 2>&1 | fgrep FAILED
>> fgrep: warning: fgrep is obsolescent; using grep -F
>>   6.1: Test event parsing                                            : FAILED!
>>  10.3: Parsing of PMU event table metrics                            : FAILED!
>>  10.4: Parsing of PMU event table metrics with fake PMUs             : FAILED!
>>  17: Setup struct perf_event_attr                                    : FAILED!
>>  24: Number of exit events of a simple workload                      : FAILED! core-dump
>>  28: Use a dummy software event to keep tracking                     : FAILED!
>>  35: Track with sched_switch                                         : FAILED!
>>  42.3: BPF prologue generation                                       : FAILED!
>>  66: Parse and process metrics                                       : FAILED!
>>  68: Event expansion for cgroups                                     : FAILED!
>>  69.2: Perf time to TSC                                              : FAILED! core-dump
>>  74: build id cache operations                                       : FAILED! core-dump
>>  81: kernel lock contention analysis test                            : FAILED!
>>  86: Zstd perf.data compression/decompression                        : FAILED! core-dump
>>  87: perf record tests                                               : FAILED! core-dump
>>  94: perf all metricgroups test                                      : FAILED!
>>  95: perf all metrics test                                           : FAILED!
>> 106: Test java symbol                                                : FAILED! core-dump
>> #
>>
>> I am afraid this will show up pretty soon in the linux tree.
>> I am going to look into each failure in the next few days.
>>
>> What I already found out is that many test cases now fail due to the
>> event/PMU rework, here is one example:
>>
>> # perf test -Fvvvv 95
>> 95: perf all metrics test
>> --- start ---
>> Testing cpi
>> ....
>> Metric 'transaction' not printed in:
>> Error:
>> The TX_NC_TABORT event is not supported.
>> ---- end ----
>> perf all metrics test: FAILED!
>> # ls -l /sys/devices/cpum_cf/events/TX_NC_TABORT
>> -r--r--r--. 1 root root 4096 Jun 13 13:49 /sys/devices/cpum_cf/events/TX_NC_TABORT
>> #
>>
>> As can be seen, the event is definitely there and supported.
>> This same test case succeeds in the linux tree!
>>
>> Hopefully I can sort out some of the failures before this code show up
>> in the linux tree.
> 
> Thanks Thomas, to be clear this is what is in
> perf-tools-next/linux-next and not 6.4?

Ian,

thanks for your help.
Correct, I am talking about the linux-next repo. The linux repo is fine.

> 
> Rather than try to do more complicated cases like the metrics tests,
> it makes sense to dig into why event parsing is failing. Test 6 first
> of all, could you give output?
> 
> Thanks,
> Ian
> 
We discussed some aspects of this about two weeks ago, but last week
I was on vacation and now I resumed my work on linux-next.
We run the linux-next perf test suite every night and I am concerned
and would like to get this sorted out before it hits Linux 6.5.

Here is the output on my linux-next tree built yesterday:
# uname -a
Linux a35lp67.lnxne.boe 6.4.0-rc6-next-20230613d-perf #2 \
              SMP Tue Jun 13 15:18:43 CEST 2023 s390x GNU/Linux
# ./perf test -F 6
  6: Parse event definition strings  :
  6.1: Test event parsing            :Segmentation fault (core dumped)
#
# gdb perf
  ....
  (gdb) r test -F 6
   6: Parse event definition strings                                  :
  6.1: Test event parsing                                            :
Program received signal SIGSEGV, Segmentation fault.
__GI_strcmp () at ../sysdeps/s390/strcmp-vx.S:47
(gdb) where
#0  __GI_strcmp () at ../sysdeps/s390/strcmp-vx.S:47
#1  0x000000000110a18c in test__term_equal_term (evlist=0x152ea80) at tests/parse-events.c:1580
#2  0x000000000110a96a in test_event (e=0x14dc758 <test.events+1416>) at tests/parse-events.c:2209
#3  0x000000000110ac58 in test_events (events=0x14dc1d0 <test.events>, cnt=61) at tests/parse-events.c:2260
#4  0x000000000110ad52 in test__events2 (test=0x1500758 <suite.parse_events>, subtest=0)
    at tests/parse-events.c:2272
#5  0x00000000010f6fac in run_test (test=0x1500758 <suite.parse_events>, subtest=0) at tests/builtin-test.c:236
#6  0x00000000010f7142 in test_and_print (t=0x1500758 <suite.parse_events>, subtest=0) at tests/builtin-test.c:265
#7  0x00000000010f7b1e in __cmd_test (argc=1, argv=0x3ffffffa320, skiplist=0x0) at tests/builtin-test.c:436
#8  0x00000000010f8404 in cmd_test (argc=1, argv=0x3ffffffa320) at tests/builtin-test.c:559
#9  0x00000000011473fc in run_builtin (p=0x14f60e8 <commands+600>, argc=3, argv=0x3ffffffa320) at perf.c:323
#10 0x000000000114776e in handle_internal_command (argc=3, argv=0x3ffffffa320) at perf.c:377
#11 0x0000000001147980 in run_argv (argcp=0x3ffffff9f94, argv=0x3ffffff9f88) at perf.c:421
#12 0x0000000001147d48 in main (argc=3, argv=0x3ffffffa320) at perf.c:537
(gdb)

To be honest, I am no expert on the yacc/bison/flex tool chain.
I understand a little bit about them, but that is it.

When I look at the output of perf test -Fvvvv 6 on linux-next, some things seem odd,
I marked them with 3 question masks ???:

# ./perf test -Fvvv 6
  6: Parse event definition strings     :
  6.1: Test event parsing               :
--- start ---
running test 0 'syscalls:sys_enter_openat'
Using CPUID IBM,3931,704,A01,3.7,002f
running test 1 'syscalls:*'
running test 2 'r1a'
running test 3 '1:1'
running test 4 'instructions'
No PMU found for 'instructions'FAILED tests/parse-events.c:143 wrong number of entries
Event test failure: test 4 'instructions'running test 5 'cycles/period=100000,config2/'
??? What is wrong here?
??? Output on linux 6.4.0rc3:
??? # ./perf stat -e instructions -- true
???
??? Performance counter stats for 'true':
???
???         2,965,720      instructions
???
???        0.002026832 seconds time elapsed
???
???        0.000056000 seconds user
???        0.002048000 seconds sys
??? #
??? This is fine and works as expected. The s390 PMU for counters
??? has a direct mapping for this. So we end up in the s390 PMU
??? to retrieve the value.
???
??? Output on linux-next
???# ./perf stat -e instructions -- true
???
??? Performance counter stats for 'true':
???
???              0.65 msec task-clock                       #    0.250 CPUs utilized
???                 0      context-switches                 #    0.000 /sec
???                 0      cpu-migrations                   #    0.000 /sec
???                49      page-faults                      #   75.375 K/sec
???         3,367,228      cycles                           #    5.180 GHz
???         2,880,270      instructions                     #    0.86  insn per cycle
???   <not supported>      branches
???   <not supported>      branch-misses
???
???       0.002599176 seconds time elapsed
???
???       0.000053000 seconds user
???       0.002650000 seconds sys
???
???#
??? Somehow we end up in a different PMU. The output is the same as if
??? I do not specify an event at all. To reach the s390 specific PMU
??? I have to add it explicitly as in:
???# ./perf stat -e cpum_cf/instructions/ -- true
???
??? Performance counter stats for 'true':
???
???         2,814,522      cpum_cf/instructions/
???
???       0.001899881 seconds time elapsed
???
???       0.000050000 seconds user
???       0.001928000 seconds sys
???
???]#
No PMU found for 'cycles/period=100000,config2/'FAILED tests/parse-events.c:157 wrong number of entries
Event test failure: test 5 'cycles/period=100000,config2/'running test 6 'faults'
...
??? Similar output for basicly all events.

No PMU found for 'cycles'running test 59 'cycles/name=name/'
No PMU found for 'name'Segmentation fault (core dumped)

Hope this helps.

PS: Should we keep the linux-perf-use mailing list as addressee? Not sure
if everybody else is interested in this?
-- 
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
Vorsitzender des Aufsichtsrats: Gregor Pillen
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


  reply	other threads:[~2023-06-14  8:32 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-13 12:54 perf test failures in linux-next on s390 Thomas Richter
2023-06-13 14:32 ` Ian Rogers
2023-06-14  8:31   ` Thomas Richter [this message]
2023-06-14 14:57     ` Ian Rogers
2023-06-15  8:57       ` Thomas Richter
2023-06-15  9:39       ` Thomas Richter
2023-06-15 14:34         ` Arnaldo Carvalho de Melo
2023-06-16 14:23           ` Ian Rogers
2023-06-16 14:36             ` Hybrid PMU issues on aarch64. was: " Arnaldo Carvalho de Melo
2023-06-16 14:44               ` Arnaldo Carvalho de Melo
2023-06-16 16:28                 ` Ian Rogers
2023-06-16 16:53                   ` Arnaldo Carvalho de Melo
2023-06-16 21:47                     ` Arnaldo Carvalho de Melo
2023-06-16 22:09                       ` Ian Rogers
2023-06-19 10:04               ` Thomas Richter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e7f0930a-a9fc-d768-a472-bd9af6fafdf5@linux.ibm.com \
    --to=tmricht@linux.ibm.com \
    --cc=acme@kernel.org \
    --cc=irogers@google.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=sumanthk@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).