* [PATCH] perf/test: Skip leader sampling for s390
@ 2025-02-28 6:22 Thomas Richter
2025-03-01 0:12 ` Namhyung Kim
0 siblings, 1 reply; 7+ messages in thread
From: Thomas Richter @ 2025-02-28 6:22 UTC (permalink / raw)
To: linux-kernel, linux-s390, linux-perf-users, acme, namhyung
Cc: agordeev, gor, sumanthk, hca, Thomas Richter
In tree linux-next
the perf test case 114 'perf record tests' has a subtest
named 'Basic leader sampling test' which always fails on s390.
Root cause is this invocation
# perf record -vv -e '{cycles,cycles}:Su' -- perf test -w brstack
...
In the debug output the following 2 event are installed:
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
{ sample_period, sample_freq } 4000
sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
read_format ID|GROUP|LOST
disabled 1
exclude_kernel 1
exclude_hv 1
freq 1
sample_id_all 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
size 136
config 0 (PERF_COUNT_HW_CPU_CYCLES)
sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
read_format ID|GROUP|LOST
exclude_kernel 1
exclude_hv 1
sample_id_all 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8 = 6
...
The first event is the group leader and is installed as sampling event.
The secound one is group member and is installed as counting event.
Namhyung Kim confirms this observation:
> Yep, the syntax '{event1,event2}:S' is for group leader sampling which
> reduces the overhead of PMU interrupts. The idea is that those events
> are scheduled together so sampling is enabled only for the leader
> (usually the first) event and it reads counts from the member events
> using PERF_SAMPLE_READ.
>
> So they should have the same counts if it uses the same events in a
> group.
However this does not work on s390. s390 has one dedicated sampling PMU
which supports only one event. A different PMU is used for counting.
Both run concurrently using different setups and frequencies.
On s390x a sampling event is setup using a preset trigger and a large
buffer. The hardware
- writes a samples (64 bytes) into this buffer
when a given number of CPU instructions has been executed.
- and triggers an interrupt when the buffer gets full.
The trigger has just a few possible values.
On s390x the counting event cycles is used to read out the numer of
CPU cycles executed.
On s390 above invocation created 2 events executed on 2 different
PMU and the result are diffent values from two independently running
PMUs which do not match in a consistent and reliably as on Intel:
# ./perf record -e '{cycles,cycles}:Su' -- perf test -w brstack
...
# ./perf script
perf 2799437 92568.845118: 5508000 cycles: 3ffbcb898b6 do_lookup_x+0x196
perf 2799437 92568.845119: 1377000 cycles: 3ffbcb898b6 do_lookup_x+0x196
perf 2799437 92568.845120: 4131000 cycles: 3ffbcb897e8 do_lookup_x+0xc8
perf 2799437 92568.845121: 1377000 cycles: 3ffbcb8a37c _dl_lookup_symbol
perf 2799437 92568.845122: 1377000 cycles: 3ffbcb89558 check_match+0x18
perf 2799437 92568.845123: 2754000 cycles: 3ffbcb89b2a do_lookup_x+0x40a
perf 2799437 92568.845124: 1377000 cycles: 3ffbcb89b1e do_lookup_x+0x3fe
As can be seen the result match very often but not all the time
make this test on s390 failing very, very often.
This patch bypasses this test on s390.
Output before:
# ./perf test 114
114: perf record tests : FAILED!
#
Output after:
# ./perf test 114
114: perf record tests : Ok
#
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
---
tools/perf/tests/shell/record.sh | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
index ba8d873d3ca7..98b69820bc5f 100755
--- a/tools/perf/tests/shell/record.sh
+++ b/tools/perf/tests/shell/record.sh
@@ -231,6 +231,12 @@ test_cgroup() {
test_leader_sampling() {
echo "Basic leader sampling test"
+ if [ "$(uname -m)" = s390x ]
+ then
+ echo "Leader sampling skipped"
+ ((skipped+=1))
+ return
+ fi
if ! perf record -o "${perfdata}" -e "{cycles,cycles}:Su" -- \
perf test -w brstack 2> /dev/null
then
--
2.45.2
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/test: Skip leader sampling for s390
2025-02-28 6:22 [PATCH] perf/test: Skip leader sampling for s390 Thomas Richter
@ 2025-03-01 0:12 ` Namhyung Kim
2025-03-01 0:36 ` Ian Rogers
0 siblings, 1 reply; 7+ messages in thread
From: Namhyung Kim @ 2025-03-01 0:12 UTC (permalink / raw)
To: Thomas Richter, Ian Rogers
Cc: linux-kernel, linux-s390, linux-perf-users, acme, agordeev, gor,
sumanthk, hca
Hello,
On Fri, Feb 28, 2025 at 07:22:41AM +0100, Thomas Richter wrote:
> In tree linux-next
> the perf test case 114 'perf record tests' has a subtest
> named 'Basic leader sampling test' which always fails on s390.
> Root cause is this invocation
>
> # perf record -vv -e '{cycles,cycles}:Su' -- perf test -w brstack
>
> ...
> In the debug output the following 2 event are installed:
>
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> { sample_period, sample_freq } 4000
> sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> read_format ID|GROUP|LOST
> disabled 1
> exclude_kernel 1
> exclude_hv 1
> freq 1
> sample_id_all 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
> ------------------------------------------------------------
> perf_event_attr:
> type 0 (PERF_TYPE_HARDWARE)
> size 136
> config 0 (PERF_COUNT_HW_CPU_CYCLES)
> sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> read_format ID|GROUP|LOST
> exclude_kernel 1
> exclude_hv 1
> sample_id_all 1
> ------------------------------------------------------------
> sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8 = 6
> ...
>
> The first event is the group leader and is installed as sampling event.
> The secound one is group member and is installed as counting event.
>
> Namhyung Kim confirms this observation:
> > Yep, the syntax '{event1,event2}:S' is for group leader sampling which
> > reduces the overhead of PMU interrupts. The idea is that those events
> > are scheduled together so sampling is enabled only for the leader
> > (usually the first) event and it reads counts from the member events
> > using PERF_SAMPLE_READ.
> >
> > So they should have the same counts if it uses the same events in a
> > group.
>
> However this does not work on s390. s390 has one dedicated sampling PMU
> which supports only one event. A different PMU is used for counting.
> Both run concurrently using different setups and frequencies.
>
> On s390x a sampling event is setup using a preset trigger and a large
> buffer. The hardware
> - writes a samples (64 bytes) into this buffer
> when a given number of CPU instructions has been executed.
> - and triggers an interrupt when the buffer gets full.
> The trigger has just a few possible values.
>
> On s390x the counting event cycles is used to read out the numer of
> CPU cycles executed.
>
> On s390 above invocation created 2 events executed on 2 different
> PMU and the result are diffent values from two independently running
> PMUs which do not match in a consistent and reliably as on Intel:
>
> # ./perf record -e '{cycles,cycles}:Su' -- perf test -w brstack
> ...
> # ./perf script
> perf 2799437 92568.845118: 5508000 cycles: 3ffbcb898b6 do_lookup_x+0x196
> perf 2799437 92568.845119: 1377000 cycles: 3ffbcb898b6 do_lookup_x+0x196
> perf 2799437 92568.845120: 4131000 cycles: 3ffbcb897e8 do_lookup_x+0xc8
> perf 2799437 92568.845121: 1377000 cycles: 3ffbcb8a37c _dl_lookup_symbol
> perf 2799437 92568.845122: 1377000 cycles: 3ffbcb89558 check_match+0x18
> perf 2799437 92568.845123: 2754000 cycles: 3ffbcb89b2a do_lookup_x+0x40a
> perf 2799437 92568.845124: 1377000 cycles: 3ffbcb89b1e do_lookup_x+0x3fe
>
> As can be seen the result match very often but not all the time
> make this test on s390 failing very, very often.
>
> This patch bypasses this test on s390.
>
> Output before:
> # ./perf test 114
> 114: perf record tests : FAILED!
> #
>
> Output after:
> # ./perf test 114
> 114: perf record tests : Ok
> #
>
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Thanks for the fix. I think Ian saw the same problem on other archs
too. Maybe we need to enable it on supported archs only.
Thanks,
Namhyung
> ---
> tools/perf/tests/shell/record.sh | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
> index ba8d873d3ca7..98b69820bc5f 100755
> --- a/tools/perf/tests/shell/record.sh
> +++ b/tools/perf/tests/shell/record.sh
> @@ -231,6 +231,12 @@ test_cgroup() {
>
> test_leader_sampling() {
> echo "Basic leader sampling test"
> + if [ "$(uname -m)" = s390x ]
> + then
> + echo "Leader sampling skipped"
> + ((skipped+=1))
> + return
> + fi
> if ! perf record -o "${perfdata}" -e "{cycles,cycles}:Su" -- \
> perf test -w brstack 2> /dev/null
> then
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/test: Skip leader sampling for s390
2025-03-01 0:12 ` Namhyung Kim
@ 2025-03-01 0:36 ` Ian Rogers
2025-03-03 5:53 ` Thomas Richter
0 siblings, 1 reply; 7+ messages in thread
From: Ian Rogers @ 2025-03-01 0:36 UTC (permalink / raw)
To: Namhyung Kim
Cc: Thomas Richter, linux-kernel, linux-s390, linux-perf-users, acme,
agordeev, gor, sumanthk, hca
On Fri, Feb 28, 2025 at 4:13 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> Hello,
>
> On Fri, Feb 28, 2025 at 07:22:41AM +0100, Thomas Richter wrote:
> > In tree linux-next
> > the perf test case 114 'perf record tests' has a subtest
> > named 'Basic leader sampling test' which always fails on s390.
> > Root cause is this invocation
> >
> > # perf record -vv -e '{cycles,cycles}:Su' -- perf test -w brstack
> >
> > ...
> > In the debug output the following 2 event are installed:
> >
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 0 (PERF_TYPE_HARDWARE)
> > size 136
> > config 0 (PERF_COUNT_HW_CPU_CYCLES)
> > { sample_period, sample_freq } 4000
> > sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> > read_format ID|GROUP|LOST
> > disabled 1
> > exclude_kernel 1
> > exclude_hv 1
> > freq 1
> > sample_id_all 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 5
> > ------------------------------------------------------------
> > perf_event_attr:
> > type 0 (PERF_TYPE_HARDWARE)
> > size 136
> > config 0 (PERF_COUNT_HW_CPU_CYCLES)
> > sample_type IP|TID|TIME|READ|CPU|PERIOD|IDENTIFIER
> > read_format ID|GROUP|LOST
> > exclude_kernel 1
> > exclude_hv 1
> > sample_id_all 1
> > ------------------------------------------------------------
> > sys_perf_event_open: pid -1 cpu 0 group_fd 5 flags 0x8 = 6
> > ...
> >
> > The first event is the group leader and is installed as sampling event.
> > The secound one is group member and is installed as counting event.
> >
> > Namhyung Kim confirms this observation:
> > > Yep, the syntax '{event1,event2}:S' is for group leader sampling which
> > > reduces the overhead of PMU interrupts. The idea is that those events
> > > are scheduled together so sampling is enabled only for the leader
> > > (usually the first) event and it reads counts from the member events
> > > using PERF_SAMPLE_READ.
> > >
> > > So they should have the same counts if it uses the same events in a
> > > group.
> >
> > However this does not work on s390. s390 has one dedicated sampling PMU
> > which supports only one event. A different PMU is used for counting.
> > Both run concurrently using different setups and frequencies.
> >
> > On s390x a sampling event is setup using a preset trigger and a large
> > buffer. The hardware
> > - writes a samples (64 bytes) into this buffer
> > when a given number of CPU instructions has been executed.
> > - and triggers an interrupt when the buffer gets full.
> > The trigger has just a few possible values.
> >
> > On s390x the counting event cycles is used to read out the numer of
> > CPU cycles executed.
> >
> > On s390 above invocation created 2 events executed on 2 different
> > PMU and the result are diffent values from two independently running
> > PMUs which do not match in a consistent and reliably as on Intel:
> >
> > # ./perf record -e '{cycles,cycles}:Su' -- perf test -w brstack
Hi Thomas,
Thanks for reporting this! Could you try adding --count=100000 so that
we're not using frequency mode and we expect the counts to look like
100,000. For example, on my x86 laptop:
```
$ perf record --count=100000 -e '{cycles,cycles}:Su' -- perf test -w brstack
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.047 MB perf.data (712 samples) ]
$ perf script
perf 635952 290271.436115: 100007 cycles:
ffffffffada00080 [unknown] ([unknown])
perf 635952 290271.436115: 100007 cycles:
ffffffffada00080 [unknown] ([unknown])
perf 635952 290271.436650: 100525 cycles:
7f86352b01b3 _dl_map_object_from_fd+0x553
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.436650: 100525 cycles:
7f86352b01b3 _dl_map_object_from_fd+0x553
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437088: 99866 cycles:
7f86352cb827 strchr+0x27
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437088: 99866 cycles:
7f86352cb827 strchr+0x27
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437376: 99912 cycles:
7f86352cba74 strcmp+0x54
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437376: 99912 cycles:
7f86352cba74 strcmp+0x54
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437509: 100279 cycles:
7f86352cba3a strcmp+0x1a
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437509: 100279 cycles:
7f86352cba3a strcmp+0x1a
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437559: 99760 cycles:
7f86352bc39f _dl_check_map_versions+0x50f
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 635952 290271.437559: 99760 cycles:
7f86352bc39f _dl_check_map_versions+0x50f
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
...
```
I'm particularly concerned if we see the cycles count very deviant
from the 100000.
> > ...
> > # ./perf script
> > perf 2799437 92568.845118: 5508000 cycles: 3ffbcb898b6 do_lookup_x+0x196
> > perf 2799437 92568.845119: 1377000 cycles: 3ffbcb898b6 do_lookup_x+0x196
> > perf 2799437 92568.845120: 4131000 cycles: 3ffbcb897e8 do_lookup_x+0xc8
> > perf 2799437 92568.845121: 1377000 cycles: 3ffbcb8a37c _dl_lookup_symbol
> > perf 2799437 92568.845122: 1377000 cycles: 3ffbcb89558 check_match+0x18
> > perf 2799437 92568.845123: 2754000 cycles: 3ffbcb89b2a do_lookup_x+0x40a
> > perf 2799437 92568.845124: 1377000 cycles: 3ffbcb89b1e do_lookup_x+0x3fe
> >
> > As can be seen the result match very often but not all the time
> > make this test on s390 failing very, very often.
Actually this is much more deviation than I'd expect. If we use
task-clock softer/timer based event I see:
```
$ perf record --count=100000 -e '{task-clock,task-clock}:Su' -- perf
test -w brstack
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.047 MB perf.data (712 samples) ]
$ perf script
perf 636643 290571.807049: 801858 task-clock:
7fdf48643439 _dl_map_object_from_fd+0x7d9
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 636643 290571.807049: 804012 task-clock:
7fdf48643439 _dl_map_object_from_fd+0x7d9
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 636643 290571.807549: 499833 task-clock:
7fdf4863eb9b _dl_map_object_deps+0x3eb
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
perf 636643 290571.807549: 498236 task-clock:
7fdf4863eb9b _dl_map_object_deps+0x3eb
(/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
```
So the count deviates by a few hundred, but your output seems to
deviate by 4 million.
So, I think the test needs to be more tolerant that should help your
case. As Namhyung mentions I think there may be another bug lurking.
Thanks,
Ian
> > This patch bypasses this test on s390.
> >
> > Output before:
> > # ./perf test 114
> > 114: perf record tests : FAILED!
> > #
> >
> > Output after:
> > # ./perf test 114
> > 114: perf record tests : Ok
> > #
> >
> > Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> > Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
>
> Thanks for the fix. I think Ian saw the same problem on other archs
> too. Maybe we need to enable it on supported archs only.
>
> Thanks,
> Namhyung
>
> > ---
> > tools/perf/tests/shell/record.sh | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh
> > index ba8d873d3ca7..98b69820bc5f 100755
> > --- a/tools/perf/tests/shell/record.sh
> > +++ b/tools/perf/tests/shell/record.sh
> > @@ -231,6 +231,12 @@ test_cgroup() {
> >
> > test_leader_sampling() {
> > echo "Basic leader sampling test"
> > + if [ "$(uname -m)" = s390x ]
> > + then
> > + echo "Leader sampling skipped"
> > + ((skipped+=1))
> > + return
> > + fi
> > if ! perf record -o "${perfdata}" -e "{cycles,cycles}:Su" -- \
> > perf test -w brstack 2> /dev/null
> > then
> > --
> > 2.45.2
> >
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/test: Skip leader sampling for s390
2025-03-01 0:36 ` Ian Rogers
@ 2025-03-03 5:53 ` Thomas Richter
2025-03-28 18:27 ` Chun-Tse Shao
0 siblings, 1 reply; 7+ messages in thread
From: Thomas Richter @ 2025-03-03 5:53 UTC (permalink / raw)
To: Ian Rogers, Namhyung Kim
Cc: linux-kernel, linux-s390, linux-perf-users, acme, agordeev, gor,
sumanthk, hca
On 3/1/25 01:36, Ian Rogers wrote:
> perf record --count=100000 -e '{cycles,cycles}:Su' -- perf test -w brstack
Ian, Namhyung,
here is my output using this command:
# ./perf record --count=100000 -e '{cycles,cycles}:Su' -- perf test -w brstack
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.106 MB perf.data (1080 samples) ]
# ./perf script
perf 22194 484835.185113: 100000 cycles: 3ff9e407c8c _dl_map_object_from_fd+0xa3c (/usr/lib/ld64.so.1)
perf 22194 484835.185114: 100000 cycles: 3ff9e408940 _dl_map_object+0x110 (/usr/lib/ld64.so.1)
perf 22194 484835.185116: 400000 cycles: 3ff9e40890e _dl_map_object+0xde (/usr/lib/ld64.so.1)
perf 22194 484835.185117: 900000 cycles: 3ff9e40b572 _dl_name_match_p+0x42 (/usr/lib/ld64.so.1)
perf 22194 484835.185118: 500000 cycles: 3ff9e407c8c _dl_map_object_from_fd+0xa3c (/usr/lib/ld64.so.1)
perf 22194 484835.185119: 100000 cycles: 3ff9e40b53e _dl_name_match_p+0xe (/usr/lib/ld64.so.1)
perf 22194 484835.185120: 100000 cycles: 3ff9e40890e _dl_map_object+0xde (/usr/lib/ld64.so.1)
perf 22194 484835.185121: 100000 cycles: 3ff9e408904 _dl_map_object+0xd4 (/usr/lib/ld64.so.1)
perf 22194 484835.185122: 100000 cycles: 3ff9e40369a _dl_map_object_deps+0xbba (/usr/lib/ld64.so.1)
perf 22194 484835.185123: 100000 cycles: 3ff9e413460 _dl_check_map_versions+0x100 (/usr/lib/ld64.so.1)
perf 22194 484835.185124: 500000 cycles: 3ff9e40b53e _dl_name_match_p+0xe (/usr/lib/ld64.so.1)
perf 22194 484835.185125: 100000 cycles: 3ff9e40e7e0 _dl_relocate_object+0x550 (/usr/lib/ld64.so.1)
perf 22194 484835.185126: 200000 cycles: 3ff9e40e7e0 _dl_relocate_object+0x550 (/usr/lib/ld64.so.1)
perf 22194 484835.185127: 200000 cycles: 3ff9e409558 check_match+0x18 (/usr/lib/ld64.so.1)
perf 22194 484835.185128: 200000 cycles: 3ff9e409894 do_lookup_x+0x174 (/usr/lib/ld64.so.1)
perf 22194 484835.185129: 100000 cycles: 3ff9e409910 do_lookup_x+0x1f0 (/usr/lib/ld64.so.1)
perf 22194 484835.185130: 100000 cycles: 3ff9e409b1e do_lookup_x+0x3fe (/usr/lib/ld64.so.1)
perf 22194 484835.185131: 100000 cycles: 3ff9e409894 do_lookup_x+0x174 (/usr/lib/ld64.so.1)
perf 22194 484835.185132: 100000 cycles: 3ff9e409558 check_match+0x18 (/usr/lib/ld64.so.1)
perf 22194 484835.187445: 100000 cycles: 3ff9e409ad4 do_lookup_x+0x3b4 (/usr/lib/ld64.so.1)
The difference when using counts instead of frequency is similar. Most of time the numbers are identical,
but sometime they do not match.
Using task-clock as event, I have similar results. The counts vary a bit, but the numbers are pretty close.
They vary by just a few hundred at the most:
# perf record --count=100000 -e '{task-clock,task-clock}:Su' -- perf test -w brstack
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.016 MB perf.data (246 samples) ]
]# ./perf script
perf 22223 485235.378380: 402070 task-clock: 3ffbed874c6 _dl_map_object_from_fd+0x276 (/usr/lib/ld64.so.1)
perf 22223 485235.378380: 404640 task-clock: 3ffbed874c6 _dl_map_object_from_fd+0x276 (/usr/lib/ld64.so.1)
perf 22223 485235.378779: 399960 task-clock: 3ffbed888de _dl_map_object+0xae (/usr/lib/ld64.so.1)
perf 22223 485235.378779: 397689 task-clock: 3ffbed888de _dl_map_object+0xae (/usr/lib/ld64.so.1)
perf 22223 485235.378879: 100055 task-clock: 3ffbed8e7e0 _dl_relocate_object+0x550 (/usr/lib/ld64.so.1)
perf 22223 485235.378879: 100100 task-clock: 3ffbed8e7e0 _dl_relocate_object+0x550 (/usr/lib/ld64.so.1)
perf 22223 485235.378979: 99981 task-clock: 3ffbed895ae check_match+0x6e (/usr/lib/ld64.so.1)
perf 22223 485235.378979: 99876 task-clock: 3ffbed895ae check_match+0x6e (/usr/lib/ld64.so.1)
perf 22223 485235.379079: 99950 task-clock: 3ffbed8974c do_lookup_x+0x2c (/usr/lib/ld64.so.1)
perf 22223 485235.379079: 99957 task-clock: 3ffbed8974c do_lookup_x+0x2c (/usr/lib/ld64.so.1)
perf 22223 485235.379179: 100051 task-clock: 3ffbed8e7f0 _dl_relocate_object+0x560 (/usr/lib/ld64.so.1)
perf 22223 485235.379179: 100004 task-clock: 3ffbed8e7f0 _dl_relocate_object+0x560 (/usr/lib/ld64.so.1)
perf 22223 485235.379279: 99933 task-clock: 3ffbed8e7ea _dl_relocate_object+0x55a (/usr/lib/ld64.so.1)
perf 22223 485235.379279: 99952 task-clock: 3ffbed8e7ea _dl_relocate_object+0x55a (/usr/lib/ld64.so.1)
Thanks for your help
--
Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany
--
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Wolfgang Wendt
Geschäftsführung: David Faller
Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/test: Skip leader sampling for s390
2025-03-03 5:53 ` Thomas Richter
@ 2025-03-28 18:27 ` Chun-Tse Shao
2025-03-28 20:05 ` Stephane Eranian
0 siblings, 1 reply; 7+ messages in thread
From: Chun-Tse Shao @ 2025-03-28 18:27 UTC (permalink / raw)
To: tmricht
Cc: acme, agordeev, gor, hca, irogers, linux-kernel, linux-perf-users,
linux-s390, namhyung, sumanthk, Stephane Eranian
We believe we know the problem, appreciate Stephan Eranian's investigation.
It comes from throttling. While the sampling is too high, the generic code
does not modify event scheduling. `perf_event_overflow()` simply returns 1,
and subsequently, `pmu_stop()` only stops the leader event, not the slave
events because the arch layer does not consider groups. Also, the
`event_stop()` callback only operates on a single event, not the siblings.
This would impact all architectures. Perhaps we can extend the
`evnet_stop()` callback to include a new argument to also stop the siblings.
We also welcome all suggestions and open to discuss any potential solutions.
Thanks,
CT
Cc: Stephane Eranian <eranian@google.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/test: Skip leader sampling for s390
2025-03-28 18:27 ` Chun-Tse Shao
@ 2025-03-28 20:05 ` Stephane Eranian
0 siblings, 0 replies; 7+ messages in thread
From: Stephane Eranian @ 2025-03-28 20:05 UTC (permalink / raw)
To: Chun-Tse Shao
Cc: tmricht, acme, agordeev, gor, hca, irogers, linux-kernel,
linux-perf-users, linux-s390, namhyung, sumanthk, Peter Zijlstra
Hi,
Thanks CT for the post. Indeed this is a long-standing bug impacting
(most likely)
all architectures. The rate throttling code does not consider event grouping. It
stops the sampling event in place (on x86) at the hardware level, not
the generic
scheduling layer. But if the event is in a group, it may make sense to also stop
all the other events in the group, i.e., stop the group. Otherwise you may get
discrepancies between samples of the "slave events". Similarly, the time_running
and time_enable logic is not modified during throttling.
Interested in hearing potential ways of solving this in a portable manner.
On Fri, Mar 28, 2025 at 11:27 AM Chun-Tse Shao <ctshao@google.com> wrote:
>
> We believe we know the problem, appreciate Stephan Eranian's investigation.
> It comes from throttling. While the sampling is too high, the generic code
> does not modify event scheduling. `perf_event_overflow()` simply returns 1,
> and subsequently, `pmu_stop()` only stops the leader event, not the slave
> events because the arch layer does not consider groups. Also, the
> `event_stop()` callback only operates on a single event, not the siblings.
>
> This would impact all architectures. Perhaps we can extend the
> `evnet_stop()` callback to include a new argument to also stop the siblings.
> We also welcome all suggestions and open to discuss any potential solutions.
>
> Thanks,
> CT
>
> Cc: Stephane Eranian <eranian@google.com>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] perf/test: Skip leader sampling for s390
@ 2025-09-18 21:11 Anubhav Shelat
0 siblings, 0 replies; 7+ messages in thread
From: Anubhav Shelat @ 2025-09-18 21:11 UTC (permalink / raw)
To: eranian
Cc: Arnaldo Carvalho de Melo, agordeev, ctshao, gor, hca, Ian Rogers,
linux-kernel, linux-perf-users, linux-s390, Namhyung Kim, peterz,
sumanthk, tmricht, Michael Petlan
I believe this issue is causing perf record tests to fail leader
samping on aarch64 machines. The important commands from the test:
# ./perf record -e "{cycles,cycles}:Su" -- ./perf test -w brstack
Lowering default frequency rate from 4000 to 1400.
Please consider tweaking /proc/sys/kernel/perf_event_max_sample_rate.
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.019 MB perf.data (210 samples) ]
# ./perf script -i perf.data | grep brstack
perf 98281 184091.292956: 621736 cycles:
53b844 brstack_bench+0x40 (/root/linux/tools/perf/perf)
perf 98281 184091.292956: 621765 cycles:
53b844 brstack_bench+0x40 (/root/linux/tools/perf/perf)
perf 98281 184091.293346: 611236 cycles:
53b784 brstack_bar+0x24 (/root/linux/tools/perf/perf)
perf 98281 184091.293346: 611266 cycles:
53b784 brstack_bar+0x24 (/root/linux/tools/perf/perf)
perf 98281 184091.293734: 587649 cycles:
53b844 brstack_bench+0x40 (/root/linux/tools/perf/perf)
perf 98281 184091.293734: 587678 cycles:
53b844 brstack_bench+0x40 (/root/linux/tools/perf/perf)
perf 98281 184091.294155: 648439 cycles:
53b780 brstack_bar+0x20 (/root/linux/tools/perf/perf)
perf 98281 184091.294155: 648469 cycles:
53b780 brstack_bar+0x20 (/root/linux/tools/perf/perf)
perf 98281 184091.294588: 716679 cycles:
53b844 brstack_bench+0x40 (/root/linux/tools/perf/perf)
perf 98281 184091.294588: 716709 cycles:
53b844 brstack_bench+0x40 (/root/linux/tools/perf/perf)
perf 98281 184091.295050: 779147 cycles:
53b814 brstack_bench+0x10 (/root/linux/tools/perf/perf)
perf 98281 184091.295050: 779177 cycles:
53b814 brstack_bench+0x10 (/root/linux/tools/perf/perf)
perf 98281 184091.295545: 842413 cycles:
53b8b8 brstack_bench+0xb4 (/root/linux/tools/perf/perf)
perf 98281 184091.295545: 842443 cycles:
53b8b8 brstack_bench+0xb4 (/root/linux/tools/perf/perf)
perf 98281 184091.296191: 899736 cycles:
53b77c brstack_bar+0x1c (/root/linux/tools/perf/perf)
perf 98281 184091.296191: 899766 cycles:
53b77c brstack_bar+0x1c (/root/linux/tools/perf/perf)
perf 98281 184091.296721: 914623 cycles:
53b794 brstack_bar+0x34 (/root/linux/tools/perf/perf)
perf 98281 184091.296721: 914652 cycles:
53b794 brstack_bar+0x34 (/root/linux/tools/perf/perf)
perf 98281 184091.297255: 926741 cycles:
53b7a8 brstack_bar+0x48 (/root/linux/tools/perf/perf)
perf 98281 184091.297255: 926770 cycles:
53b7a8 brstack_bar+0x48 (/root/linux/tools/perf/perf)
perf 98281 184091.297813: 966974 cycles:
53b778 brstack_bar+0x18 (/root/linux/tools/perf/perf)
perf 98281 184091.297813: 967003 cycles:
53b778 brstack_bar+0x18 (/root/linux/tools/perf/perf)
perf 98281 184091.298394: 1007743 cycles:
53b784 brstack_bar+0x24 (/root/linux/tools/perf/perf)
perf 98281 184091.298394: 1007772 cycles:
53b784 brstack_bar+0x24 (/root/linux/tools/perf/perf)
perf 98281 184091.298991: 1043010 cycles:
53b794 brstack_bar+0x34 (/root/linux/tools/perf/perf)
perf 98281 184091.298991: 1043039 cycles:
53b794 brstack_bar+0x34 (/root/linux/tools/perf/perf)
perf 98281 184091.299604: 1072961 cycles:
53b794 brstack_bar+0x34 (/root/linux/tools/perf/perf)
perf 98281 184091.299604: 1072990 cycles:
53b794 brstack_bar+0x34 (/root/linux/tools/perf/perf)
perf 98281 184091.300234: 1099175 cycles:
53b768 brstack_bar+0x8 (/root/linux/tools/perf/perf)
perf 98281 184091.300234: 1099204 cycles:
53b768 brstack_bar+0x8 (/root/linux/tools/perf/perf)
perf 98281 184091.300870: 1121830 cycles:
53b898 brstack_bench+0x94 (/root/linux/tools/perf/perf)
perf 98281 184091.300870: 1121860 cycles:
53b898 brstack_bench+0x94 (/root/linux/tools/perf/perf)
perf 98281 184091.301515: 1140634 cycles:
53b788 brstack_bar+0x28 (/root/linux/tools/perf/perf)
perf 98281 184091.301515: 1140664 cycles:
53b788 brstack_bar+0x28 (/root/linux/tools/perf/perf)
perf 98281 184091.302174: 1158251 cycles:
53b7f0 brstack_foo+0x40 (/root/linux/tools/perf/perf)
perf 98281 184091.302174: 1158281 cycles:
53b7f0 brstack_foo+0x40 (/root/linux/tools/perf/perf)
perf 98281 184091.302838: 1173750 cycles:
53b774 brstack_bar+0x14 (/root/linux/tools/perf/perf)
perf 98281 184091.302838: 1173780 cycles:
53b774 brstack_bar+0x14 (/root/linux/tools/perf/perf)
perf 98281 184091.303504: 1186018 cycles:
53b794 brstack_bar+0x34 (/root/linux/tools/perf/perf)
perf 98281 184091.303504: 1186048 cycles:
53b794 brstack_bar+0x34 (/root/linux/tools/perf/perf)
perf 98281 184091.304185: 1197272 cycles:
53b7fc brstack_foo+0x4c (/root/linux/tools/perf/perf)
perf 98281 184091.304185: 1197302 cycles:
53b7fc brstack_foo+0x4c (/root/linux/tools/perf/perf)
perf 98281 184091.304864: 1208165 cycles:
53b768 brstack_bar+0x8 (/root/linux/tools/perf/perf)
perf 98281 184091.304864: 1208194 cycles:
53b768 brstack_bar+0x8 (/root/linux/tools/perf/perf)
perf 98281 184091.306794: 1215537 cycles:
53b914 brstack+0x58 (/root/linux/tools/perf/perf)
perf 98281 184091.306794: 3426542 cycles:
53b914 brstack+0x58 (/root/linux/tools/perf/perf)
Usually the difference between the leader and sibling counts is about
30 cycles, but occasionally there's a really big difference. When
running the perf record with '-e "{cycles,cycles,cycles}:Su"' the two
sibling events have the same cycle count.
There is no difference between the leader and sibling when running on
x86 systems using the cycles event, but when using the task-clock
event, the results were similar to Thomas' on both x86 and aarch64.
Any advice would be appreciated.
Thanks,
Anubhav
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-09-18 21:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-02-28 6:22 [PATCH] perf/test: Skip leader sampling for s390 Thomas Richter
2025-03-01 0:12 ` Namhyung Kim
2025-03-01 0:36 ` Ian Rogers
2025-03-03 5:53 ` Thomas Richter
2025-03-28 18:27 ` Chun-Tse Shao
2025-03-28 20:05 ` Stephane Eranian
-- strict thread matches above, loose matches on Subject: below --
2025-09-18 21:11 Anubhav Shelat
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).