* [PATCH] Revert "perf test: Allow tolerance for leader sampling test" @ 2025-10-23 13:24 Anubhav Shelat 2025-10-24 8:40 ` Thomas Richter 0 siblings, 1 reply; 15+ messages in thread From: Anubhav Shelat @ 2025-10-23 13:24 UTC (permalink / raw) To: mpetlan, acme, namhyung, irogers, linux-perf-users Cc: peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi, james.clark, Anubhav Shelat This reverts commit 1c5721ca89a1c8ae71082d3a102b39fd1ec0a205. The throttling bug has been fixed in 9734e25fbf5a perf: Fix the throttle logic for a group. So this commit can be reverted. Signed-off-by: Anubhav Shelat <ashelat@redhat.com> --- tools/perf/tests/shell/record.sh | 33 ++++++-------------------------- 1 file changed, 6 insertions(+), 27 deletions(-) diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh index 0f5841c479e7..13e0d6ef66c9 100755 --- a/tools/perf/tests/shell/record.sh +++ b/tools/perf/tests/shell/record.sh @@ -267,43 +267,22 @@ test_leader_sampling() { err=1 return fi - perf script -i "${perfdata}" | grep brstack > $script_output - # Check if the two instruction counts are equal in each record. - # However, the throttling code doesn't consider event grouping. During throttling, only the - # leader is stopped, causing the slave's counts significantly higher. To temporarily solve this, - # let's set the tolerance rate to 80%. - # TODO: Revert the code for tolerance once the throttling mechanism is fixed. index=0 - valid_counts=0 - invalid_counts=0 - tolerance_rate=0.8 + perf script -i "${perfdata}" | grep brstack > "${script_output}" while IFS= read -r line do + # Check if the two instruction counts are equal in each record cycles=$(echo $line | awk '{for(i=1;i<=NF;i++) if($i=="cycles:") print $(i-1)}') if [ $(($index%2)) -ne 0 ] && [ ${cycles}x != ${prev_cycles}x ] then - invalid_counts=$(($invalid_counts+1)) - else - valid_counts=$(($valid_counts+1)) + echo "Leader sampling [Failed inconsistent cycles count]" + err=1 + return fi index=$(($index+1)) prev_cycles=$cycles done < "${script_output}" - total_counts=$(bc <<< "$invalid_counts+$valid_counts") - if (( $(bc <<< "$total_counts <= 0") )) - then - echo "Leader sampling [No sample generated]" - err=1 - return - fi - isok=$(bc <<< "scale=2; if (($invalid_counts/$total_counts) < (1-$tolerance_rate)) { 0 } else { 1 };") - if [ $isok -eq 1 ] - then - echo "Leader sampling [Failed inconsistent cycles count]" - err=1 - else - echo "Basic leader sampling test [Success]" - fi + echo "Basic leader sampling test [Success]" } test_topdown_leader_sampling() { -- 2.51.0 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-23 13:24 [PATCH] Revert "perf test: Allow tolerance for leader sampling test" Anubhav Shelat @ 2025-10-24 8:40 ` Thomas Richter 2025-10-24 17:21 ` Anubhav Shelat 0 siblings, 1 reply; 15+ messages in thread From: Thomas Richter @ 2025-10-24 8:40 UTC (permalink / raw) To: Anubhav Shelat, mpetlan, acme, namhyung, irogers, linux-perf-users Cc: peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi, james.clark On 10/23/25 15:24, Anubhav Shelat wrote: > This reverts commit 1c5721ca89a1c8ae71082d3a102b39fd1ec0a205. > > The throttling bug has been fixed in 9734e25fbf5a perf: Fix the throttle > logic for a group. So this commit can be reverted. > > Signed-off-by: Anubhav Shelat <ashelat@redhat.com> > --- > tools/perf/tests/shell/record.sh | 33 ++++++-------------------------- > 1 file changed, 6 insertions(+), 27 deletions(-) > > diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh > index 0f5841c479e7..13e0d6ef66c9 100755 > --- a/tools/perf/tests/shell/record.sh > +++ b/tools/perf/tests/shell/record.sh > @@ -267,43 +267,22 @@ test_leader_sampling() { > err=1 > return > fi > - perf script -i "${perfdata}" | grep brstack > $script_output > - # Check if the two instruction counts are equal in each record. > - # However, the throttling code doesn't consider event grouping. During throttling, only the > - # leader is stopped, causing the slave's counts significantly higher. To temporarily solve this, > - # let's set the tolerance rate to 80%. > - # TODO: Revert the code for tolerance once the throttling mechanism is fixed. > index=0 > - valid_counts=0 > - invalid_counts=0 > - tolerance_rate=0.8 > + perf script -i "${perfdata}" | grep brstack > "${script_output}" > while IFS= read -r line > do > + # Check if the two instruction counts are equal in each record > cycles=$(echo $line | awk '{for(i=1;i<=NF;i++) if($i=="cycles:") print $(i-1)}') > if [ $(($index%2)) -ne 0 ] && [ ${cycles}x != ${prev_cycles}x ] > then > - invalid_counts=$(($invalid_counts+1)) > - else > - valid_counts=$(($valid_counts+1)) > + echo "Leader sampling [Failed inconsistent cycles count]" > + err=1 > + return > fi > index=$(($index+1)) > prev_cycles=$cycles > done < "${script_output}" > - total_counts=$(bc <<< "$invalid_counts+$valid_counts") > - if (( $(bc <<< "$total_counts <= 0") )) > - then > - echo "Leader sampling [No sample generated]" > - err=1 > - return > - fi > - isok=$(bc <<< "scale=2; if (($invalid_counts/$total_counts) < (1-$tolerance_rate)) { 0 } else { 1 };") > - if [ $isok -eq 1 ] > - then > - echo "Leader sampling [Failed inconsistent cycles count]" > - err=1 > - else > - echo "Basic leader sampling test [Success]" > - fi > + echo "Basic leader sampling test [Success]" > } > > test_topdown_leader_sampling() { I disagree here. Reverting this patch cause the test case to fail very often on s390. The test fails about every 2nd run, because the is one run-away value out of many. Here is an example: # ./perf record -e "{cycles,cycles}:Su" -- perf test -w brstack [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.015 MB perf.data (74 samples) ] [root@b83lp65 perf]# perf script | grep brstack perf 136408 340637.903395: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) perf 136408 340637.903396: 1377000 cycles: 1171664 brstack_bench+0x24 (/root/linux/tools/perf/perf) perf 136408 340637.903396: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) perf 136408 340637.903397: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) perf 136408 340637.903398: 1377000 cycles: 11716e8 brstack_bench+0xa8 (/root/linux/tools/perf/perf) perf 136408 340637.903398: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) perf 136408 340637.903399: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136408 340637.910844: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) perf 136408 340637.910844: 39843371 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) perf 136408 340637.910845: 1377000 cycles: 1171632 brstack_foo+0x4a (/root/linux/tools/perf/perf) perf 136408 340637.910846: 1377000 cycles: 1171692 brstack_bench+0x52 (/root/linux/tools/perf/perf) perf 136408 340637.910847: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) perf 136408 340637.910847: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136408 340637.910848: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) perf 136408 340637.910848: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) perf 136408 340637.910849: 1377000 cycles: 11717ae brstack+0x86 (/root/linux/tools/perf/perf) perf 136408 340637.910850: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136408 340637.910850: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) perf 136408 340637.910851: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) perf 136408 340637.910851: 1377000 cycles: 117159e brstack_bar+0x6 (/root/linux/tools/perf/perf) perf 136408 340637.910852: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) perf 136408 340637.910853: 1377000 cycles: 117179e brstack+0x76 (/root/linux/tools/perf/perf) perf 136408 340637.910853: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) perf 136408 340637.910854: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) perf 136408 340637.910855: 1377000 cycles: 1171612 brstack_foo+0x2a (/root/linux/tools/perf/perf) perf 136408 340637.910855: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136408 340637.910856: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) perf 136408 340637.910856: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136408 340637.910857: 1377000 cycles: 1171632 brstack_foo+0x4a (/root/linux/tools/perf/perf) .... many more lines with identical cycles value. I have contacted our hardware/firmware team, but have not gotten a response back. I still think this has to do with s390 LPAR running under hyperviser control and I do not know what happens when the hipervisor kicks in. I agree with James Clark that this should be handled transperently by the hipervisor, that means stopping the LPAR should stop the CPU measurement unit, before giving control to a different lpar. But what happens when the hipervisor just kicks in and returns to the same LPAR again? Or does some admin work on behalf of this LPAR. As long as I can not answer this question, I would like to keep some ratio to handle run-away values. As said before, this happens in roughly 50% of the runs... Here is a run where the test succeeds without a run-away value: # ./perf record -e "{cycles,cycles}:Su" -- perf test -w brstack [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.015 MB perf.data (70 samples) ] # perf script | grep brstack perf 136455 341212.430466: 1377000 cycles: 117159e brstack_bar+0x6 (/root/linux/tools/perf/perf) perf 136455 341212.430467: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136455 341212.430468: 1377000 cycles: 1171612 brstack_foo+0x2a (/root/linux/tools/perf/perf) perf 136455 341212.430468: 1377000 cycles: 1171656 brstack_bench+0x16 (/root/linux/tools/perf/perf) perf 136455 341212.430469: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) perf 136455 341212.430469: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136455 341212.430470: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) perf 136455 341212.430471: 1377000 cycles: 11716c6 brstack_bench+0x86 (/root/linux/tools/perf/perf) perf 136455 341212.430471: 1377000 cycles: 1171622 brstack_foo+0x3a (/root/linux/tools/perf/perf) perf 136455 341212.430472: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136455 341212.430472: 1377000 cycles: 1171692 brstack_bench+0x52 (/root/linux/tools/perf/perf) perf 136455 341212.430473: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) perf 136455 341212.430474: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) perf 136455 341212.430474: 1377000 cycles: 117169a brstack_bench+0x5a (/root/linux/tools/perf/perf) perf 136455 341212.430475: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136455 341212.430475: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136455 341212.430476: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) perf 136455 341212.430476: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136455 341212.430477: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) perf 136455 341212.430478: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) perf 136455 341212.430478: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) perf 136455 341212.430479: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) perf 136455 341212.430479: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) perf 136455 341212.430480: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136455 341212.430481: 1377000 cycles: 11715fc brstack_foo+0x14 (/root/linux/tools/perf/perf) perf 136455 341212.430481: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) perf 136455 341212.430482: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) perf 136455 341212.430482: 1377000 cycles: 1171664 brstack_bench+0x24 (/root/linux/tools/perf/perf) perf 136455 341212.430483: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) perf 136455 341212.430484: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) perf 136455 341212.430484: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) # So please let this code in otherwise we need an s390 specific check for success. Thanks a lot -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-24 8:40 ` Thomas Richter @ 2025-10-24 17:21 ` Anubhav Shelat 2025-10-27 10:27 ` Thomas Richter 2025-10-28 11:30 ` James Clark 0 siblings, 2 replies; 15+ messages in thread From: Anubhav Shelat @ 2025-10-24 17:21 UTC (permalink / raw) To: Thomas Richter Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi, james.clark The issue on arm is similar. Failing about half the time, with 1 failing case. So maybe the same issue on arm. Anubhav On Fri, Oct 24, 2025 at 9:40 AM Thomas Richter <tmricht@linux.ibm.com> wrote: > > On 10/23/25 15:24, Anubhav Shelat wrote: > > This reverts commit 1c5721ca89a1c8ae71082d3a102b39fd1ec0a205. > > > > The throttling bug has been fixed in 9734e25fbf5a perf: Fix the throttle > > logic for a group. So this commit can be reverted. > > > > Signed-off-by: Anubhav Shelat <ashelat@redhat.com> > > --- > > tools/perf/tests/shell/record.sh | 33 ++++++-------------------------- > > 1 file changed, 6 insertions(+), 27 deletions(-) > > > > diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh > > index 0f5841c479e7..13e0d6ef66c9 100755 > > --- a/tools/perf/tests/shell/record.sh > > +++ b/tools/perf/tests/shell/record.sh > > @@ -267,43 +267,22 @@ test_leader_sampling() { > > err=1 > > return > > fi > > - perf script -i "${perfdata}" | grep brstack > $script_output > > - # Check if the two instruction counts are equal in each record. > > - # However, the throttling code doesn't consider event grouping. During throttling, only the > > - # leader is stopped, causing the slave's counts significantly higher. To temporarily solve this, > > - # let's set the tolerance rate to 80%. > > - # TODO: Revert the code for tolerance once the throttling mechanism is fixed. > > index=0 > > - valid_counts=0 > > - invalid_counts=0 > > - tolerance_rate=0.8 > > + perf script -i "${perfdata}" | grep brstack > "${script_output}" > > while IFS= read -r line > > do > > + # Check if the two instruction counts are equal in each record > > cycles=$(echo $line | awk '{for(i=1;i<=NF;i++) if($i=="cycles:") print $(i-1)}') > > if [ $(($index%2)) -ne 0 ] && [ ${cycles}x != ${prev_cycles}x ] > > then > > - invalid_counts=$(($invalid_counts+1)) > > - else > > - valid_counts=$(($valid_counts+1)) > > + echo "Leader sampling [Failed inconsistent cycles count]" > > + err=1 > > + return > > fi > > index=$(($index+1)) > > prev_cycles=$cycles > > done < "${script_output}" > > - total_counts=$(bc <<< "$invalid_counts+$valid_counts") > > - if (( $(bc <<< "$total_counts <= 0") )) > > - then > > - echo "Leader sampling [No sample generated]" > > - err=1 > > - return > > - fi > > - isok=$(bc <<< "scale=2; if (($invalid_counts/$total_counts) < (1-$tolerance_rate)) { 0 } else { 1 };") > > - if [ $isok -eq 1 ] > > - then > > - echo "Leader sampling [Failed inconsistent cycles count]" > > - err=1 > > - else > > - echo "Basic leader sampling test [Success]" > > - fi > > + echo "Basic leader sampling test [Success]" > > } > > > > test_topdown_leader_sampling() { > > I disagree here. Reverting this patch cause the test case to fail very often on s390. > The test fails about every 2nd run, because the is one run-away value out of many. > Here is an example: > > # ./perf record -e "{cycles,cycles}:Su" -- perf test -w brstack > [ perf record: Woken up 2 times to write data ] > [ perf record: Captured and wrote 0.015 MB perf.data (74 samples) ] > [root@b83lp65 perf]# perf script | grep brstack > perf 136408 340637.903395: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) > perf 136408 340637.903396: 1377000 cycles: 1171664 brstack_bench+0x24 (/root/linux/tools/perf/perf) > perf 136408 340637.903396: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) > perf 136408 340637.903397: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) > perf 136408 340637.903398: 1377000 cycles: 11716e8 brstack_bench+0xa8 (/root/linux/tools/perf/perf) > perf 136408 340637.903398: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) > perf 136408 340637.903399: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136408 340637.910844: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) > perf 136408 340637.910844: 39843371 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) > perf 136408 340637.910845: 1377000 cycles: 1171632 brstack_foo+0x4a (/root/linux/tools/perf/perf) > perf 136408 340637.910846: 1377000 cycles: 1171692 brstack_bench+0x52 (/root/linux/tools/perf/perf) > perf 136408 340637.910847: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) > perf 136408 340637.910847: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136408 340637.910848: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) > perf 136408 340637.910848: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) > perf 136408 340637.910849: 1377000 cycles: 11717ae brstack+0x86 (/root/linux/tools/perf/perf) > perf 136408 340637.910850: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136408 340637.910850: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) > perf 136408 340637.910851: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) > perf 136408 340637.910851: 1377000 cycles: 117159e brstack_bar+0x6 (/root/linux/tools/perf/perf) > perf 136408 340637.910852: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) > perf 136408 340637.910853: 1377000 cycles: 117179e brstack+0x76 (/root/linux/tools/perf/perf) > perf 136408 340637.910853: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) > perf 136408 340637.910854: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) > perf 136408 340637.910855: 1377000 cycles: 1171612 brstack_foo+0x2a (/root/linux/tools/perf/perf) > perf 136408 340637.910855: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136408 340637.910856: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) > perf 136408 340637.910856: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136408 340637.910857: 1377000 cycles: 1171632 brstack_foo+0x4a (/root/linux/tools/perf/perf) > .... many more lines with identical cycles value. > > I have contacted our hardware/firmware team, but have not gotten a response back. > I still think this has to do with s390 LPAR running under hyperviser control and I do not know what > happens when the hipervisor kicks in. > > I agree with James Clark that this should be handled transperently by the hipervisor, that means > stopping the LPAR should stop the CPU measurement unit, before giving control to a different lpar. > > But what happens when the hipervisor just kicks in and returns to the same LPAR again? Or does > some admin work on behalf of this LPAR. As long as I can not answer this question, I would like > to keep some ratio to handle run-away values. > > As said before, this happens in roughly 50% of the runs... > > Here is a run where the test succeeds without a run-away value: > > # ./perf record -e "{cycles,cycles}:Su" -- perf test -w brstack > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.015 MB perf.data (70 samples) ] > # perf script | grep brstack > perf 136455 341212.430466: 1377000 cycles: 117159e brstack_bar+0x6 (/root/linux/tools/perf/perf) > perf 136455 341212.430467: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136455 341212.430468: 1377000 cycles: 1171612 brstack_foo+0x2a (/root/linux/tools/perf/perf) > perf 136455 341212.430468: 1377000 cycles: 1171656 brstack_bench+0x16 (/root/linux/tools/perf/perf) > perf 136455 341212.430469: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) > perf 136455 341212.430469: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136455 341212.430470: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) > perf 136455 341212.430471: 1377000 cycles: 11716c6 brstack_bench+0x86 (/root/linux/tools/perf/perf) > perf 136455 341212.430471: 1377000 cycles: 1171622 brstack_foo+0x3a (/root/linux/tools/perf/perf) > perf 136455 341212.430472: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136455 341212.430472: 1377000 cycles: 1171692 brstack_bench+0x52 (/root/linux/tools/perf/perf) > perf 136455 341212.430473: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) > perf 136455 341212.430474: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) > perf 136455 341212.430474: 1377000 cycles: 117169a brstack_bench+0x5a (/root/linux/tools/perf/perf) > perf 136455 341212.430475: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136455 341212.430475: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136455 341212.430476: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) > perf 136455 341212.430476: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136455 341212.430477: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) > perf 136455 341212.430478: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) > perf 136455 341212.430478: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) > perf 136455 341212.430479: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) > perf 136455 341212.430479: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) > perf 136455 341212.430480: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136455 341212.430481: 1377000 cycles: 11715fc brstack_foo+0x14 (/root/linux/tools/perf/perf) > perf 136455 341212.430481: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > perf 136455 341212.430482: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) > perf 136455 341212.430482: 1377000 cycles: 1171664 brstack_bench+0x24 (/root/linux/tools/perf/perf) > perf 136455 341212.430483: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) > perf 136455 341212.430484: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) > perf 136455 341212.430484: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) > # > > So please let this code in otherwise we need an s390 specific check for success. > > Thanks a lot > -- > Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany > -- > IBM Deutschland Research & Development GmbH > > Vorsitzender des Aufsichtsrats: Wolfgang Wendt > > Geschäftsführung: David Faller > > Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-24 17:21 ` Anubhav Shelat @ 2025-10-27 10:27 ` Thomas Richter 2025-10-28 11:30 ` James Clark 1 sibling, 0 replies; 15+ messages in thread From: Thomas Richter @ 2025-10-27 10:27 UTC (permalink / raw) To: Anubhav Shelat Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi, james.clark On 10/24/25 19:21, Anubhav Shelat wrote: > The issue on arm is similar. Failing about half the time, with 1 > failing case. So maybe the same issue on arm. > > Anubhav Right, so can you please stop this reverting patch! Thanks a lot. -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-24 17:21 ` Anubhav Shelat 2025-10-27 10:27 ` Thomas Richter @ 2025-10-28 11:30 ` James Clark 2025-10-28 12:55 ` Thomas Richter 1 sibling, 1 reply; 15+ messages in thread From: James Clark @ 2025-10-28 11:30 UTC (permalink / raw) To: Anubhav Shelat, Thomas Richter Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi On 24/10/2025 6:21 pm, Anubhav Shelat wrote: > The issue on arm is similar. Failing about half the time, with 1 > failing case. So maybe the same issue on arm. > > Anubhav > You mentioned on the other thread that it's failing "differently", can you expand on that? I'm wondering why you sent the revert patch then? As I mentioned before I'm not seeing any issues. Can you share the kernel version that you tested on and your kernel config? And can you share the same outputs that I asked Thomas for below please. > On Fri, Oct 24, 2025 at 9:40 AM Thomas Richter <tmricht@linux.ibm.com> wrote: >> >> On 10/23/25 15:24, Anubhav Shelat wrote: >>> This reverts commit 1c5721ca89a1c8ae71082d3a102b39fd1ec0a205. >>> >>> The throttling bug has been fixed in 9734e25fbf5a perf: Fix the throttle >>> logic for a group. So this commit can be reverted. >>> >>> Signed-off-by: Anubhav Shelat <ashelat@redhat.com> >>> --- >>> tools/perf/tests/shell/record.sh | 33 ++++++-------------------------- >>> 1 file changed, 6 insertions(+), 27 deletions(-) >>> >>> diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh >>> index 0f5841c479e7..13e0d6ef66c9 100755 >>> --- a/tools/perf/tests/shell/record.sh >>> +++ b/tools/perf/tests/shell/record.sh >>> @@ -267,43 +267,22 @@ test_leader_sampling() { >>> err=1 >>> return >>> fi >>> - perf script -i "${perfdata}" | grep brstack > $script_output >>> - # Check if the two instruction counts are equal in each record. >>> - # However, the throttling code doesn't consider event grouping. During throttling, only the >>> - # leader is stopped, causing the slave's counts significantly higher. To temporarily solve this, >>> - # let's set the tolerance rate to 80%. >>> - # TODO: Revert the code for tolerance once the throttling mechanism is fixed. >>> index=0 >>> - valid_counts=0 >>> - invalid_counts=0 >>> - tolerance_rate=0.8 >>> + perf script -i "${perfdata}" | grep brstack > "${script_output}" >>> while IFS= read -r line >>> do >>> + # Check if the two instruction counts are equal in each record >>> cycles=$(echo $line | awk '{for(i=1;i<=NF;i++) if($i=="cycles:") print $(i-1)}') >>> if [ $(($index%2)) -ne 0 ] && [ ${cycles}x != ${prev_cycles}x ] >>> then >>> - invalid_counts=$(($invalid_counts+1)) >>> - else >>> - valid_counts=$(($valid_counts+1)) >>> + echo "Leader sampling [Failed inconsistent cycles count]" >>> + err=1 >>> + return >>> fi >>> index=$(($index+1)) >>> prev_cycles=$cycles >>> done < "${script_output}" >>> - total_counts=$(bc <<< "$invalid_counts+$valid_counts") >>> - if (( $(bc <<< "$total_counts <= 0") )) >>> - then >>> - echo "Leader sampling [No sample generated]" >>> - err=1 >>> - return >>> - fi >>> - isok=$(bc <<< "scale=2; if (($invalid_counts/$total_counts) < (1-$tolerance_rate)) { 0 } else { 1 };") >>> - if [ $isok -eq 1 ] >>> - then >>> - echo "Leader sampling [Failed inconsistent cycles count]" >>> - err=1 >>> - else >>> - echo "Basic leader sampling test [Success]" >>> - fi >>> + echo "Basic leader sampling test [Success]" >>> } >>> >>> test_topdown_leader_sampling() { >> >> I disagree here. Reverting this patch cause the test case to fail very often on s390. >> The test fails about every 2nd run, because the is one run-away value out of many. >> Here is an example: I suppose that depends on what the reason for the failure is. I don't think we've gotten to the bottom of that yet. It's ok to have a test failure if the actual behaviour doesn't match the intented behaviour. At the moment it looks like we're trying to hide some defect with a tolerance value. This makes the test less useful, and it also wastes developer time when the tolerance value will inevitably be increased and increased with more and more investigations until it tests nothing. Not having any tolerance to begin with will make this less likely to happen. >> >> # ./perf record -e "{cycles,cycles}:Su" -- perf test -w brstack >> [ perf record: Woken up 2 times to write data ] >> [ perf record: Captured and wrote 0.015 MB perf.data (74 samples) ] >> [root@b83lp65 perf]# perf script | grep brstack >> perf 136408 340637.903395: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) >> perf 136408 340637.903396: 1377000 cycles: 1171664 brstack_bench+0x24 (/root/linux/tools/perf/perf) >> perf 136408 340637.903396: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) >> perf 136408 340637.903397: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) >> perf 136408 340637.903398: 1377000 cycles: 11716e8 brstack_bench+0xa8 (/root/linux/tools/perf/perf) >> perf 136408 340637.903398: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) >> perf 136408 340637.903399: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136408 340637.910844: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) >> perf 136408 340637.910844: 39843371 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) >> perf 136408 340637.910845: 1377000 cycles: 1171632 brstack_foo+0x4a (/root/linux/tools/perf/perf) >> perf 136408 340637.910846: 1377000 cycles: 1171692 brstack_bench+0x52 (/root/linux/tools/perf/perf) >> perf 136408 340637.910847: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >> perf 136408 340637.910847: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136408 340637.910848: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) >> perf 136408 340637.910848: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) >> perf 136408 340637.910849: 1377000 cycles: 11717ae brstack+0x86 (/root/linux/tools/perf/perf) >> perf 136408 340637.910850: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136408 340637.910850: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >> perf 136408 340637.910851: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >> perf 136408 340637.910851: 1377000 cycles: 117159e brstack_bar+0x6 (/root/linux/tools/perf/perf) >> perf 136408 340637.910852: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) >> perf 136408 340637.910853: 1377000 cycles: 117179e brstack+0x76 (/root/linux/tools/perf/perf) >> perf 136408 340637.910853: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) >> perf 136408 340637.910854: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) >> perf 136408 340637.910855: 1377000 cycles: 1171612 brstack_foo+0x2a (/root/linux/tools/perf/perf) >> perf 136408 340637.910855: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136408 340637.910856: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) >> perf 136408 340637.910856: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136408 340637.910857: 1377000 cycles: 1171632 brstack_foo+0x4a (/root/linux/tools/perf/perf) >> .... many more lines with identical cycles value. >> >> I have contacted our hardware/firmware team, but have not gotten a response back. >> I still think this has to do with s390 LPAR running under hyperviser control and I do not know what >> happens when the hipervisor kicks in. >> >> I agree with James Clark that this should be handled transperently by the hipervisor, that means >> stopping the LPAR should stop the CPU measurement unit, before giving control to a different lpar. >> >> But what happens when the hipervisor just kicks in and returns to the same LPAR again? Or does >> some admin work on behalf of this LPAR. As long as I can not answer this question, I would like >> to keep some ratio to handle run-away values. >> >> As said before, this happens in roughly 50% of the runs... >> >> Here is a run where the test succeeds without a run-away value: >> >> # ./perf record -e "{cycles,cycles}:Su" -- perf test -w brstack >> [ perf record: Woken up 1 times to write data ] >> [ perf record: Captured and wrote 0.015 MB perf.data (70 samples) ] >> # perf script | grep brstack >> perf 136455 341212.430466: 1377000 cycles: 117159e brstack_bar+0x6 (/root/linux/tools/perf/perf) >> perf 136455 341212.430467: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136455 341212.430468: 1377000 cycles: 1171612 brstack_foo+0x2a (/root/linux/tools/perf/perf) >> perf 136455 341212.430468: 1377000 cycles: 1171656 brstack_bench+0x16 (/root/linux/tools/perf/perf) I'm a bit confused how the instruction pointers and timestamps are different. Shouldn't the counters be part of a single sample? Which kernel version is this exactly? Can you skip the grep, we only care about the samples and not what process it happened to be in so that might be hiding something. And can you share the raw dump of a sample (perf report -D). One sample that has the matching counts and one that doesn't. Mine look like this, although I can't share one that doesn't match because I can't reproduce it: 0 1381669508860 0x24b20 [0x70]: PERF_RECORD_SAMPLE(IP, 0x2): 1136/1136: 0xaaaac8f51588 period: 414710 addr: 0 ... sample_read: .... group nr 2 ..... id 0000000000000336, value 00000000000f38f0, lost 0 ..... id 0000000000000337, value 00000000000f38f0, lost 0 ... thread: stress:1136 ...... dso: /usr/bin/stress ... thread: stress:1136 ...... dso: /usr/bin/stress >> perf 136455 341212.430469: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >> perf 136455 341212.430469: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136455 341212.430470: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) >> perf 136455 341212.430471: 1377000 cycles: 11716c6 brstack_bench+0x86 (/root/linux/tools/perf/perf) >> perf 136455 341212.430471: 1377000 cycles: 1171622 brstack_foo+0x3a (/root/linux/tools/perf/perf) >> perf 136455 341212.430472: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136455 341212.430472: 1377000 cycles: 1171692 brstack_bench+0x52 (/root/linux/tools/perf/perf) >> perf 136455 341212.430473: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) >> perf 136455 341212.430474: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) >> perf 136455 341212.430474: 1377000 cycles: 117169a brstack_bench+0x5a (/root/linux/tools/perf/perf) >> perf 136455 341212.430475: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136455 341212.430475: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136455 341212.430476: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) >> perf 136455 341212.430476: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136455 341212.430477: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) >> perf 136455 341212.430478: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) >> perf 136455 341212.430478: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) >> perf 136455 341212.430479: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >> perf 136455 341212.430479: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) >> perf 136455 341212.430480: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136455 341212.430481: 1377000 cycles: 11715fc brstack_foo+0x14 (/root/linux/tools/perf/perf) >> perf 136455 341212.430481: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> perf 136455 341212.430482: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >> perf 136455 341212.430482: 1377000 cycles: 1171664 brstack_bench+0x24 (/root/linux/tools/perf/perf) >> perf 136455 341212.430483: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) >> perf 136455 341212.430484: 1377000 cycles: 117170c brstack_bench+0xcc (/root/linux/tools/perf/perf) >> perf 136455 341212.430484: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >> # >> >> So please let this code in otherwise we need an s390 specific check for success. This might be an option if there is something in the hardware that prevents the counts from being the same. But because they are the same 99% of the time it seems like it has the capability and it's just a bug somewhere. James >> >> Thanks a lot >> -- >> Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany >> -- >> IBM Deutschland Research & Development GmbH >> >> Vorsitzender des Aufsichtsrats: Wolfgang Wendt >> >> Geschäftsführung: David Faller >> >> Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 >> > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-28 11:30 ` James Clark @ 2025-10-28 12:55 ` Thomas Richter 2025-10-28 15:23 ` James Clark 2025-10-30 13:52 ` Anubhav Shelat 0 siblings, 2 replies; 15+ messages in thread From: Thomas Richter @ 2025-10-28 12:55 UTC (permalink / raw) To: James Clark, Anubhav Shelat Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi On 10/28/25 12:30, James Clark wrote: > > > On 24/10/2025 6:21 pm, Anubhav Shelat wrote: >> The issue on arm is similar. Failing about half the time, with 1 >> failing case. So maybe the same issue on arm. >> >> Anubhav >> > > You mentioned on the other thread that it's failing "differently", can you expand on that? I'm wondering why you sent the revert patch then? > > As I mentioned before I'm not seeing any issues. Can you share the kernel version that you tested on and your kernel config? And can you share the same outputs that I asked Thomas for below please. > >> On Fri, Oct 24, 2025 at 9:40 AM Thomas Richter <tmricht@linux.ibm.com> wrote: >>> >>> On 10/23/25 15:24, Anubhav Shelat wrote: >>>> This reverts commit 1c5721ca89a1c8ae71082d3a102b39fd1ec0a205. >>>> >>>> The throttling bug has been fixed in 9734e25fbf5a perf: Fix the throttle >>>> logic for a group. So this commit can be reverted. >>>> >>>> Signed-off-by: Anubhav Shelat <ashelat@redhat.com> >>>> --- >>>> tools/perf/tests/shell/record.sh | 33 ++++++-------------------------- >>>> 1 file changed, 6 insertions(+), 27 deletions(-) >>>> >>>> diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh >>>> index 0f5841c479e7..13e0d6ef66c9 100755 >>>> --- a/tools/perf/tests/shell/record.sh >>>> +++ b/tools/perf/tests/shell/record.sh >>>> @@ -267,43 +267,22 @@ test_leader_sampling() { >>>> err=1 >>>> return >>>> fi >>>> - perf script -i "${perfdata}" | grep brstack > $script_output >>>> - # Check if the two instruction counts are equal in each record. >>>> - # However, the throttling code doesn't consider event grouping. During throttling, only the >>>> - # leader is stopped, causing the slave's counts significantly higher. To temporarily solve this, >>>> - # let's set the tolerance rate to 80%. >>>> - # TODO: Revert the code for tolerance once the throttling mechanism is fixed. >>>> index=0 >>>> - valid_counts=0 >>>> - invalid_counts=0 >>>> - tolerance_rate=0.8 >>>> + perf script -i "${perfdata}" | grep brstack > "${script_output}" >>>> while IFS= read -r line >>>> do >>>> + # Check if the two instruction counts are equal in each record >>>> cycles=$(echo $line | awk '{for(i=1;i<=NF;i++) if($i=="cycles:") print $(i-1)}') >>>> if [ $(($index%2)) -ne 0 ] && [ ${cycles}x != ${prev_cycles}x ] >>>> then >>>> - invalid_counts=$(($invalid_counts+1)) >>>> - else >>>> - valid_counts=$(($valid_counts+1)) >>>> + echo "Leader sampling [Failed inconsistent cycles count]" >>>> + err=1 >>>> + return >>>> fi >>>> index=$(($index+1)) >>>> prev_cycles=$cycles >>>> done < "${script_output}" >>>> - total_counts=$(bc <<< "$invalid_counts+$valid_counts") >>>> - if (( $(bc <<< "$total_counts <= 0") )) >>>> - then >>>> - echo "Leader sampling [No sample generated]" >>>> - err=1 >>>> - return >>>> - fi >>>> - isok=$(bc <<< "scale=2; if (($invalid_counts/$total_counts) < (1-$tolerance_rate)) { 0 } else { 1 };") >>>> - if [ $isok -eq 1 ] >>>> - then >>>> - echo "Leader sampling [Failed inconsistent cycles count]" >>>> - err=1 >>>> - else >>>> - echo "Basic leader sampling test [Success]" >>>> - fi >>>> + echo "Basic leader sampling test [Success]" >>>> } >>>> >>>> test_topdown_leader_sampling() { >>> >>> I disagree here. Reverting this patch cause the test case to fail very often on s390. >>> The test fails about every 2nd run, because the is one run-away value out of many. >>> Here is an example: > > I suppose that depends on what the reason for the failure is. I don't think we've gotten to the bottom of that yet. It's ok to have a test failure if the actual behaviour doesn't match the intented behaviour. > > At the moment it looks like we're trying to hide some defect with a tolerance value. This makes the test less useful, and it also wastes developer time when the tolerance value will inevitably be increased and increased with more and more investigations until it tests nothing. Not having any tolerance to begin with will make this less likely to happen. > >>> >>> # ./perf record -e "{cycles,cycles}:Su" -- perf test -w brstack >>> [ perf record: Woken up 2 times to write data ] >>> [ perf record: Captured and wrote 0.015 MB perf.data (74 samples) ] >>> [root@b83lp65 perf]# perf script | grep brstack >>> perf 136408 340637.903395: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) >>> perf 136408 340637.903396: 1377000 cycles: 1171664 brstack_bench+0x24 (/root/linux/tools/perf/perf) >>> perf 136408 340637.903396: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) >>> perf 136408 340637.903397: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) >>> perf 136408 340637.903398: 1377000 cycles: 11716e8 brstack_bench+0xa8 (/root/linux/tools/perf/perf) >>> perf 136408 340637.903398: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) >>> perf 136408 340637.903399: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910844: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910844: 39843371 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910845: 1377000 cycles: 1171632 brstack_foo+0x4a (/root/linux/tools/perf/perf) >>> perf 136408 340637.910846: 1377000 cycles: 1171692 brstack_bench+0x52 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910847: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >>> perf 136408 340637.910847: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910848: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910848: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910849: 1377000 cycles: 11717ae brstack+0x86 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910850: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910850: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >>> perf 136408 340637.910851: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >>> perf 136408 340637.910851: 1377000 cycles: 117159e brstack_bar+0x6 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910852: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910853: 1377000 cycles: 117179e brstack+0x76 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910853: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) >>> perf 136408 340637.910854: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910855: 1377000 cycles: 1171612 brstack_foo+0x2a (/root/linux/tools/perf/perf) >>> perf 136408 340637.910855: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910856: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910856: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>> perf 136408 340637.910857: 1377000 cycles: 1171632 brstack_foo+0x4a (/root/linux/tools/perf/perf) >>> .... many more lines with identical cycles value. >>> >>> I have contacted our hardware/firmware team, but have not gotten a response back. >>> I still think this has to do with s390 LPAR running under hyperviser control and I do not know what >>> happens when the hipervisor kicks in. >>> >>> I agree with James Clark that this should be handled transperently by the hipervisor, that means >>> stopping the LPAR should stop the CPU measurement unit, before giving control to a different lpar. >>> >>> But what happens when the hipervisor just kicks in and returns to the same LPAR again? Or does >>> some admin work on behalf of this LPAR. As long as I can not answer this question, I would like >>> to keep some ratio to handle run-away values. >>> >>> As said before, this happens in roughly 50% of the runs... >>> >>> Here is a run where the test succeeds without a run-away value: >>> >>> # ./perf record -e "{cycles,cycles}:Su" -- perf test -w brstack >>> [ perf record: Woken up 1 times to write data ] >>> [ perf record: Captured and wrote 0.015 MB perf.data (70 samples) ] >>> # perf script | grep brstack >>> perf 136455 341212.430466: 1377000 cycles: 117159e brstack_bar+0x6 (/root/linux/tools/perf/perf) >>> perf 136455 341212.430467: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>> perf 136455 341212.430468: 1377000 cycles: 1171612 brstack_foo+0x2a (/root/linux/tools/perf/perf) >>> perf 136455 341212.430468: 1377000 cycles: 1171656 brstack_bench+0x16 (/root/linux/tools/perf/perf) > > I'm a bit confused how the instruction pointers and timestamps are different. Shouldn't the counters be part of a single sample? > Two diffenrent runs on the same machine. The different addresses are most likely from different load addresses of the libraries and executables. > Which kernel version is this exactly? The kernel was built last night from our build machine. It is # uname -a Linux b83lp69.lnxne.boe 6.18.0-20251027.rc3.git2.fd57572253bc.63.fc42.s390x+git #1 SMP Mon Oct 27 20:06:12 CET 2025 s390x GNU/Linux So latest kernel code from upstream > > Can you skip the grep, we only care about the samples and not what process it happened to be in so that might be hiding something. And can you share the raw dump of a sample (perf report -D). One sample that has the matching counts and one that doesn't. > > Mine look like this, although I can't share one that doesn't match because I can't reproduce it: > > 0 1381669508860 0x24b20 [0x70]: PERF_RECORD_SAMPLE(IP, 0x2): 1136/1136: 0xaaaac8f51588 period: 414710 addr: 0 > ... sample_read: > .... group nr 2 > ..... id 0000000000000336, value 00000000000f38f0, lost 0 > ..... id 0000000000000337, value 00000000000f38f0, lost 0 > ... thread: stress:1136 > ...... dso: /usr/bin/stress > ... thread: stress:1136 > ...... dso: /usr/bin/stress > > When I skip the grep it actually gets worse, there re more run away values: # perf record -e "{cycles,cycles}:Su" -- perf test -w brstack [ perf record: Woken up 2 times to write data ] [ perf record: Captured and wrote 0.012 MB perf.data (50 samples) ] # perf script | head -20 perf 919810 6726.456179: 2754000 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) perf 919810 6726.456179: 58638457 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) perf 919810 6726.456182: 1377000 cycles: 3ff9560a696 check_match+0x76 (/usr/lib/ld64.so.1) perf 919810 6726.456182: 1377000 cycles: 3ff9560fa6a _dl_relocate_object_no_relro+0x5fa (/usr/lib/ld64.so.1) perf 919810 6726.456182: 1377000 cycles: 3ff9560ac04 do_lookup_x+0x404 (/usr/lib/ld64.so.1) perf 919810 6726.456183: 1377000 cycles: 3ff9560f9fa _dl_relocate_object_no_relro+0x58a (/usr/lib/ld64.so.1) perf 919810 6726.456183: 4131000 cycles: 3ff9560f970 _dl_relocate_object_no_relro+0x500 (/usr/lib/ld64.so.1) perf 919810 6726.456183: 2754000 cycles: 3ff9560b48c _dl_lookup_symbol_x+0x5c (/usr/lib/ld64.so.1) perf 919810 6726.456183: 1377000 cycles: 3ff9560ac1c do_lookup_x+0x41c (/usr/lib/ld64.so.1) perf 919810 6726.456183: 1377000 cycles: 3ff9560b4b6 _dl_lookup_symbol_x+0x86 (/usr/lib/ld64.so.1) perf 919810 6726.456184: 1377000 cycles: 3ff9560abac do_lookup_x+0x3ac (/usr/lib/ld64.so.1) perf 919810 6726.456184: 1377000 cycles: 3ff9560b4b6 _dl_lookup_symbol_x+0x86 (/usr/lib/ld64.so.1) perf 919810 6726.456184: 1377000 cycles: 3ff9560a706 check_match+0xe6 (/usr/lib/ld64.so.1) perf 919810 6726.456184: 2754000 cycles: 3ff9560f970 _dl_relocate_object_no_relro+0x500 (/usr/lib/ld64.so.1) perf 919810 6726.456185: 8262000 cycles: 3ff94b28520 mi_option_init+0x80 (/usr/lib64/libpython3.13.so.1.0) perf 919810 6726.456185: 1377000 cycles: 2aa015527f4 brstack_bench+0x94 (/usr/bin/perf) perf 919810 6726.456185: 1377000 cycles: 2aa01552804 brstack_bench+0xa4 (/usr/bin/perf) perf 919810 6726.456185: 1377000 cycles: 2aa015526ec brstack_bar+0x34 (/usr/bin/perf) perf 919810 6726.456185: 1377000 cycles: 2aa01552808 brstack_bench+0xa8 (/usr/bin/perf) perf 919810 6726.456186: 1377000 cycles: 2aa01552760 brstack_bench+0x0 (/usr/bin/perf) # And here is the output of the first entry (_dl_map_object), the value of both counters are different: 6726456179732 0x1b88 [0x68]: PERF_RECORD_SAMPLE(IP, 0x2): 919810/919810: 0x3ff95608ec8 period: 1377000 addr: 0 ... sample_read: .... group nr 2 ..... id 0000000000001fbe, value 00000000002a05d0, lost 0 ..... id 0000000000001fde, value 00000000037ec079, lost 0 ... thread: perf:919810 ...... dso: /usr/lib/ld64.so.1 ... thread: perf:919810 ...... dso: /usr/lib/ld64.so.1 In fact there are no entries with are identical. The counters always differ. The counter with id 1fde is has 2 diffenrent values: ❯ perf report -D|grep 0000000000001fde ..... id 0000000000001fde, value 00000000037ec079, lost 0 ..... ..... id 0000000000001fde, value 00000000037ec079, lost 0 ..... id 0000000000001fde, value 00000000049dc845, lost 0 The counter with id 1fbe has always diffenrent values, its increment is (most of the time) 1377000, or sometimes multiple thereof: ❯ perf report -D|grep 0000000000001fbe ..... id 0000000000001fbe, value 00000000002a05d0, lost 0 ..... id 0000000000001fbe, value 00000000003f08b8, lost 0 ..... id 0000000000001fbe, value 0000000000540ba0, lost 0 ..... id 0000000000001fbe, value 0000000000690e88, lost 0 ..... id 0000000000001fbe, value 00000000007e1170, lost 0 ..... id 0000000000001fbe, value 0000000000bd1a28, lost 0 ..... id 0000000000001fbe, value 0000000000e71ff8, lost 0 ..... id 0000000000001fbe, value 0000000000fc22e0, lost 0 ..... id 0000000000001fbe, value 00000000011125c8, lost 0 ..... id 0000000000001fbe, value 00000000012628b0, lost 0 ..... id 0000000000001fbe, value 00000000013b2b98, lost 0 ..... id 0000000000001fbe, value 0000000001502e80, lost 0 ..... id 0000000000001fbe, value 00000000017a3450, lost 0 ..... id 0000000000001fbe, value 0000000001f845c0, lost 0 ..... id 0000000000001fbe, value 00000000020d48a8, lost 0 ..... id 0000000000001fbe, value 0000000002224b90, lost 0 ..... id 0000000000001fbe, value 0000000002374e78, lost 0 ..... id 0000000000001fbe, value 00000000024c5160, lost 0 So it looks like there is some issue with this test. Thanks for pointing this out. I will look into this. -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-28 12:55 ` Thomas Richter @ 2025-10-28 15:23 ` James Clark 2025-10-29 7:37 ` Thomas Richter 2025-10-30 13:52 ` Anubhav Shelat 1 sibling, 1 reply; 15+ messages in thread From: James Clark @ 2025-10-28 15:23 UTC (permalink / raw) To: Thomas Richter, Anubhav Shelat Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi On 28/10/2025 12:55 pm, Thomas Richter wrote: > On 10/28/25 12:30, James Clark wrote: >> >> >> On 24/10/2025 6:21 pm, Anubhav Shelat wrote: >>> The issue on arm is similar. Failing about half the time, with 1 >>> failing case. So maybe the same issue on arm. >>> >>> Anubhav >>> >> >> You mentioned on the other thread that it's failing "differently", can you expand on that? I'm wondering why you sent the revert patch then? >> >> As I mentioned before I'm not seeing any issues. Can you share the kernel version that you tested on and your kernel config? And can you share the same outputs that I asked Thomas for below please. >> >>> On Fri, Oct 24, 2025 at 9:40 AM Thomas Richter <tmricht@linux.ibm.com> wrote: >>>> >>>> On 10/23/25 15:24, Anubhav Shelat wrote: >>>>> This reverts commit 1c5721ca89a1c8ae71082d3a102b39fd1ec0a205. >>>>> >>>>> The throttling bug has been fixed in 9734e25fbf5a perf: Fix the throttle >>>>> logic for a group. So this commit can be reverted. >>>>> >>>>> Signed-off-by: Anubhav Shelat <ashelat@redhat.com> >>>>> --- >>>>> tools/perf/tests/shell/record.sh | 33 ++++++-------------------------- >>>>> 1 file changed, 6 insertions(+), 27 deletions(-) >>>>> >>>>> diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh >>>>> index 0f5841c479e7..13e0d6ef66c9 100755 >>>>> --- a/tools/perf/tests/shell/record.sh >>>>> +++ b/tools/perf/tests/shell/record.sh >>>>> @@ -267,43 +267,22 @@ test_leader_sampling() { >>>>> err=1 >>>>> return >>>>> fi >>>>> - perf script -i "${perfdata}" | grep brstack > $script_output >>>>> - # Check if the two instruction counts are equal in each record. >>>>> - # However, the throttling code doesn't consider event grouping. During throttling, only the >>>>> - # leader is stopped, causing the slave's counts significantly higher. To temporarily solve this, >>>>> - # let's set the tolerance rate to 80%. >>>>> - # TODO: Revert the code for tolerance once the throttling mechanism is fixed. >>>>> index=0 >>>>> - valid_counts=0 >>>>> - invalid_counts=0 >>>>> - tolerance_rate=0.8 >>>>> + perf script -i "${perfdata}" | grep brstack > "${script_output}" >>>>> while IFS= read -r line >>>>> do >>>>> + # Check if the two instruction counts are equal in each record >>>>> cycles=$(echo $line | awk '{for(i=1;i<=NF;i++) if($i=="cycles:") print $(i-1)}') >>>>> if [ $(($index%2)) -ne 0 ] && [ ${cycles}x != ${prev_cycles}x ] >>>>> then >>>>> - invalid_counts=$(($invalid_counts+1)) >>>>> - else >>>>> - valid_counts=$(($valid_counts+1)) >>>>> + echo "Leader sampling [Failed inconsistent cycles count]" >>>>> + err=1 >>>>> + return >>>>> fi >>>>> index=$(($index+1)) >>>>> prev_cycles=$cycles >>>>> done < "${script_output}" >>>>> - total_counts=$(bc <<< "$invalid_counts+$valid_counts") >>>>> - if (( $(bc <<< "$total_counts <= 0") )) >>>>> - then >>>>> - echo "Leader sampling [No sample generated]" >>>>> - err=1 >>>>> - return >>>>> - fi >>>>> - isok=$(bc <<< "scale=2; if (($invalid_counts/$total_counts) < (1-$tolerance_rate)) { 0 } else { 1 };") >>>>> - if [ $isok -eq 1 ] >>>>> - then >>>>> - echo "Leader sampling [Failed inconsistent cycles count]" >>>>> - err=1 >>>>> - else >>>>> - echo "Basic leader sampling test [Success]" >>>>> - fi >>>>> + echo "Basic leader sampling test [Success]" >>>>> } >>>>> >>>>> test_topdown_leader_sampling() { >>>> >>>> I disagree here. Reverting this patch cause the test case to fail very often on s390. >>>> The test fails about every 2nd run, because the is one run-away value out of many. >>>> Here is an example: >> >> I suppose that depends on what the reason for the failure is. I don't think we've gotten to the bottom of that yet. It's ok to have a test failure if the actual behaviour doesn't match the intented behaviour. >> >> At the moment it looks like we're trying to hide some defect with a tolerance value. This makes the test less useful, and it also wastes developer time when the tolerance value will inevitably be increased and increased with more and more investigations until it tests nothing. Not having any tolerance to begin with will make this less likely to happen. >> >>>> >>>> # ./perf record -e "{cycles,cycles}:Su" -- perf test -w brstack >>>> [ perf record: Woken up 2 times to write data ] >>>> [ perf record: Captured and wrote 0.015 MB perf.data (74 samples) ] >>>> [root@b83lp65 perf]# perf script | grep brstack >>>> perf 136408 340637.903395: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) >>>> perf 136408 340637.903396: 1377000 cycles: 1171664 brstack_bench+0x24 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.903396: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.903397: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.903398: 1377000 cycles: 11716e8 brstack_bench+0xa8 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.903398: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) >>>> perf 136408 340637.903399: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910844: 1377000 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910844: 39843371 cycles: 11716e4 brstack_bench+0xa4 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910845: 1377000 cycles: 1171632 brstack_foo+0x4a (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910846: 1377000 cycles: 1171692 brstack_bench+0x52 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910847: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910847: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910848: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910848: 1377000 cycles: 11715e8 brstack_foo+0x0 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910849: 1377000 cycles: 11717ae brstack+0x86 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910850: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910850: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910851: 1377000 cycles: 11716ee brstack_bench+0xae (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910851: 1377000 cycles: 117159e brstack_bar+0x6 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910852: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910853: 1377000 cycles: 117179e brstack+0x76 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910853: 1377000 cycles: 1171606 brstack_foo+0x1e (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910854: 1377000 cycles: 11716d4 brstack_bench+0x94 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910855: 1377000 cycles: 1171612 brstack_foo+0x2a (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910855: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910856: 1377000 cycles: 1171598 brstack_bar+0x0 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910856: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>>> perf 136408 340637.910857: 1377000 cycles: 1171632 brstack_foo+0x4a (/root/linux/tools/perf/perf) >>>> .... many more lines with identical cycles value. >>>> >>>> I have contacted our hardware/firmware team, but have not gotten a response back. >>>> I still think this has to do with s390 LPAR running under hyperviser control and I do not know what >>>> happens when the hipervisor kicks in. >>>> >>>> I agree with James Clark that this should be handled transperently by the hipervisor, that means >>>> stopping the LPAR should stop the CPU measurement unit, before giving control to a different lpar. >>>> >>>> But what happens when the hipervisor just kicks in and returns to the same LPAR again? Or does >>>> some admin work on behalf of this LPAR. As long as I can not answer this question, I would like >>>> to keep some ratio to handle run-away values. >>>> >>>> As said before, this happens in roughly 50% of the runs... >>>> >>>> Here is a run where the test succeeds without a run-away value: >>>> >>>> # ./perf record -e "{cycles,cycles}:Su" -- perf test -w brstack >>>> [ perf record: Woken up 1 times to write data ] >>>> [ perf record: Captured and wrote 0.015 MB perf.data (70 samples) ] >>>> # perf script | grep brstack >>>> perf 136455 341212.430466: 1377000 cycles: 117159e brstack_bar+0x6 (/root/linux/tools/perf/perf) >>>> perf 136455 341212.430467: 1377000 cycles: 11715cc brstack_bar+0x34 (/root/linux/tools/perf/perf) >>>> perf 136455 341212.430468: 1377000 cycles: 1171612 brstack_foo+0x2a (/root/linux/tools/perf/perf) >>>> perf 136455 341212.430468: 1377000 cycles: 1171656 brstack_bench+0x16 (/root/linux/tools/perf/perf) >> >> I'm a bit confused how the instruction pointers and timestamps are different. Shouldn't the counters be part of a single sample? >> > > Two diffenrent runs on the same machine. The different addresses are most likely from different load addresses of the > libraries and executables. > >> Which kernel version is this exactly? > The kernel was built last night from our build machine. It is > # uname -a > Linux b83lp69.lnxne.boe 6.18.0-20251027.rc3.git2.fd57572253bc.63.fc42.s390x+git #1 SMP Mon Oct 27 20:06:12 CET 2025 s390x GNU/Linux > > So latest kernel code from upstream > >> >> Can you skip the grep, we only care about the samples and not what process it happened to be in so that might be hiding something. And can you share the raw dump of a sample (perf report -D). One sample that has the matching counts and one that doesn't. >> >> Mine look like this, although I can't share one that doesn't match because I can't reproduce it: >> >> 0 1381669508860 0x24b20 [0x70]: PERF_RECORD_SAMPLE(IP, 0x2): 1136/1136: 0xaaaac8f51588 period: 414710 addr: 0 >> ... sample_read: >> .... group nr 2 >> ..... id 0000000000000336, value 00000000000f38f0, lost 0 >> ..... id 0000000000000337, value 00000000000f38f0, lost 0 >> ... thread: stress:1136 >> ...... dso: /usr/bin/stress >> ... thread: stress:1136 >> ...... dso: /usr/bin/stress >> >> > When I skip the grep it actually gets worse, there re more run away values: > # perf record -e "{cycles,cycles}:Su" -- perf test -w brstack > [ perf record: Woken up 2 times to write data ] > [ perf record: Captured and wrote 0.012 MB perf.data (50 samples) ] > # perf script | head -20 > perf 919810 6726.456179: 2754000 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) > perf 919810 6726.456179: 58638457 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) > perf 919810 6726.456182: 1377000 cycles: 3ff9560a696 check_match+0x76 (/usr/lib/ld64.so.1) > perf 919810 6726.456182: 1377000 cycles: 3ff9560fa6a _dl_relocate_object_no_relro+0x5fa (/usr/lib/ld64.so.1) Can you share the raw output for the second sample as well? Or even the whole file would be better. It's the addresses from this sample that are confusing. 0x3ff95608ec8 is the same for both counters on the first sample (correctly), but the second sample has 0x3ff9560a696 and 0x3ff9560fa6a even though the cycles counts are the same. > perf 919810 6726.456182: 1377000 cycles: 3ff9560ac04 do_lookup_x+0x404 (/usr/lib/ld64.so.1) > perf 919810 6726.456183: 1377000 cycles: 3ff9560f9fa _dl_relocate_object_no_relro+0x58a (/usr/lib/ld64.so.1) > perf 919810 6726.456183: 4131000 cycles: 3ff9560f970 _dl_relocate_object_no_relro+0x500 (/usr/lib/ld64.so.1) > perf 919810 6726.456183: 2754000 cycles: 3ff9560b48c _dl_lookup_symbol_x+0x5c (/usr/lib/ld64.so.1) > perf 919810 6726.456183: 1377000 cycles: 3ff9560ac1c do_lookup_x+0x41c (/usr/lib/ld64.so.1) > perf 919810 6726.456183: 1377000 cycles: 3ff9560b4b6 _dl_lookup_symbol_x+0x86 (/usr/lib/ld64.so.1) > perf 919810 6726.456184: 1377000 cycles: 3ff9560abac do_lookup_x+0x3ac (/usr/lib/ld64.so.1) > perf 919810 6726.456184: 1377000 cycles: 3ff9560b4b6 _dl_lookup_symbol_x+0x86 (/usr/lib/ld64.so.1) > perf 919810 6726.456184: 1377000 cycles: 3ff9560a706 check_match+0xe6 (/usr/lib/ld64.so.1) > perf 919810 6726.456184: 2754000 cycles: 3ff9560f970 _dl_relocate_object_no_relro+0x500 (/usr/lib/ld64.so.1) > perf 919810 6726.456185: 8262000 cycles: 3ff94b28520 mi_option_init+0x80 (/usr/lib64/libpython3.13.so.1.0) > perf 919810 6726.456185: 1377000 cycles: 2aa015527f4 brstack_bench+0x94 (/usr/bin/perf) > perf 919810 6726.456185: 1377000 cycles: 2aa01552804 brstack_bench+0xa4 (/usr/bin/perf) > perf 919810 6726.456185: 1377000 cycles: 2aa015526ec brstack_bar+0x34 (/usr/bin/perf) > perf 919810 6726.456185: 1377000 cycles: 2aa01552808 brstack_bench+0xa8 (/usr/bin/perf) > perf 919810 6726.456186: 1377000 cycles: 2aa01552760 brstack_bench+0x0 (/usr/bin/perf) > > # > > And here is the output of the first entry (_dl_map_object), the value of both counters are different: > 6726456179732 0x1b88 [0x68]: PERF_RECORD_SAMPLE(IP, 0x2): 919810/919810: 0x3ff95608ec8 period: 1377000 addr: 0 > ... sample_read: > .... group nr 2 > ..... id 0000000000001fbe, value 00000000002a05d0, lost 0 > ..... id 0000000000001fde, value 00000000037ec079, lost 0 > ... thread: perf:919810 > ...... dso: /usr/lib/ld64.so.1 > ... thread: perf:919810 > ...... dso: /usr/lib/ld64.so.1 > > > In fact there are no entries with are identical. The counters always differ. > The counter with id 1fde is has 2 diffenrent values: > I suppose it's actually the delta between them that's important. Considering your very first sample has different counts maybe the second counter didn't start at zero. Then whenver you get another non-matching value one of the counters had wrapped and the other hasn't yet? If you send the whole file I can look in more detail. > ❯ perf report -D|grep 0000000000001fde > ..... id 0000000000001fde, value 00000000037ec079, lost 0 > ..... > ..... id 0000000000001fde, value 00000000037ec079, lost 0 > ..... id 0000000000001fde, value 00000000049dc845, lost 0 > > The counter with id 1fbe has always diffenrent values, its increment is > (most of the time) 1377000, or sometimes multiple thereof: > > ❯ perf report -D|grep 0000000000001fbe > ..... id 0000000000001fbe, value 00000000002a05d0, lost 0 > ..... id 0000000000001fbe, value 00000000003f08b8, lost 0 > ..... id 0000000000001fbe, value 0000000000540ba0, lost 0 > ..... id 0000000000001fbe, value 0000000000690e88, lost 0 > ..... id 0000000000001fbe, value 00000000007e1170, lost 0 > ..... id 0000000000001fbe, value 0000000000bd1a28, lost 0 > ..... id 0000000000001fbe, value 0000000000e71ff8, lost 0 > ..... id 0000000000001fbe, value 0000000000fc22e0, lost 0 > ..... id 0000000000001fbe, value 00000000011125c8, lost 0 > ..... id 0000000000001fbe, value 00000000012628b0, lost 0 > ..... id 0000000000001fbe, value 00000000013b2b98, lost 0 > ..... id 0000000000001fbe, value 0000000001502e80, lost 0 > ..... id 0000000000001fbe, value 00000000017a3450, lost 0 > ..... id 0000000000001fbe, value 0000000001f845c0, lost 0 > ..... id 0000000000001fbe, value 00000000020d48a8, lost 0 > ..... id 0000000000001fbe, value 0000000002224b90, lost 0 > ..... id 0000000000001fbe, value 0000000002374e78, lost 0 > ..... id 0000000000001fbe, value 00000000024c5160, lost 0 > > So it looks like there is some issue with this test. Thanks for pointing this out. > I will look into this. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-28 15:23 ` James Clark @ 2025-10-29 7:37 ` Thomas Richter 2025-10-29 9:25 ` James Clark 0 siblings, 1 reply; 15+ messages in thread From: Thomas Richter @ 2025-10-29 7:37 UTC (permalink / raw) To: James Clark, Anubhav Shelat Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi [-- Attachment #1: Type: text/plain, Size: 6486 bytes --] On 10/28/25 16:23, James Clark wrote: > > > On 28/10/2025 12:55 pm, Thomas Richter wrote: ..... >> When I skip the grep it actually gets worse, there re more run away values: >> # perf record -e "{cycles,cycles}:Su" -- perf test -w brstack >> [ perf record: Woken up 2 times to write data ] >> [ perf record: Captured and wrote 0.012 MB perf.data (50 samples) ] >> # perf script | head -20 >> perf 919810 6726.456179: 2754000 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) >> perf 919810 6726.456179: 58638457 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) >> perf 919810 6726.456182: 1377000 cycles: 3ff9560a696 check_match+0x76 (/usr/lib/ld64.so.1) >> perf 919810 6726.456182: 1377000 cycles: 3ff9560fa6a _dl_relocate_object_no_relro+0x5fa (/usr/lib/ld64.so.1) > > Can you share the raw output for the second sample as well? Or even the whole file would be better. Ok I will append a perf.data from today and hopefully it will be delivered to you: See attachment perf.data.tmrs390 (binary file, big endian from s390) > > It's the addresses from this sample that are confusing. 0x3ff95608ec8 is the same for both counters on the first sample (correctly), but the second sample has 0x3ff9560a696 and 0x3ff9560fa6a even though the cycles counts are the same. > Command ./perf record -r 99 -e "{cycles,cycles}:Su" -- ./perf test -w brstack is testing leadership group sampling in tests/shell/record.sh and fails most of the time on s390. The command opens event cycles (as group leader) for sampling and the s390 sampling facility is started with default frequency of 4000. This can be seen in the debug output: perf record opening and mmapping events Opening: cycles ------------------------------------------------------------ perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|READ|ID|PERIOD read_format ID|GROUP|LOST disabled 1 inherit 1 exclude_kernel 1 exclude_hv 1 mmap 1 comm 1 freq 1 enable_on_exec 1 task 1 sample_id_all 1 mmap2 1 comm_exec 1 ksymbol 1 bpf_event 1 build_id 1 .... Next event cycles is opened in the s390 counting facility: Opening: cycles perf_event_attr: type 0 (PERF_TYPE_HARDWARE) size 136 config 0 (PERF_COUNT_HW_CPU_CYCLES) sample_type IP|TID|TIME|READ|ID|PERIOD read_format ID|GROUP|LOST inherit 1 exclude_kernel 1 exclude_hv 1 sample_id_all 1 So now there are 2 hardware events when are mapped on s390 to 1. event handled by CPU Measurement sampling facility, hardware writes 32 byte large samples to buffers. The frequency of 4000 Hz translates to a sample every 1300000 instructions. Interrupt driven. 2. event handled by CPU Measurement counting facilitly, hardware runs in the background and increments counters accordingly. All available counters (about 400) are running in the background and read via assembler instruction until stopped. No interrupts. If I understand this setup correctly, the first event is the group leader and either both events run and are active or non of them. That is the reason why both values should be identitical Is this true? Now given two independent CPU measurement units on s390, one running in the background incrementing counters, the other interrupt driven reading samples, there is always room the both counters to differ. The question is how much and how often. When I look at the debug output of the perf.data file; I get this: 55805554120788 0x22a8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x2): 14135/14135: 0x3ff9ae90340 period: 1300000 addr: 0 ... sample_read: .... group nr 2 ..... id 00000000000000b4, value 000000000115b5c0, lost 0 ..... id 00000000000000bc, value 000000000195ac03, lost 0 ... thread: perf:14135 ...... dso: /usr/lib/ld64.so.1 The first value is the count from the sampling event, it gets incremented with 4000 Hz frequency: # perf report -D -i/tmp/perf.data.tmrs390|grep 00000000000000b4,|head -10 ..... id 00000000000000b4, value 000000000101dfa0, lost 0 ..... id 00000000000000b4, value 000000000115b5c0, lost 0 ..... id 00000000000000b4, value 00000000013d6200, lost 0 ..... id 00000000000000b4, value 0000000001513820, lost 0 ..... id 00000000000000b4, value 0000000001650e40, lost 0 ..... id 00000000000000b4, value 00000000018cba80, lost 0 ..... id 00000000000000b4, value 0000000001a090a0, lost 0 ..... id 00000000000000b4, value 0000000001b466c0, lost 0 ..... id 00000000000000b4, value 0000000001c83ce0, lost 0 ..... id 00000000000000b4, value 0000000001dc1300, lost 0 value 115b5c0 - 101dfa0 = 13d620 --> 1300000 period time. So that value always increments by period time. The other counter id is: # perf report -D -i/tmp/perf.data.tmrs390|grep 00000000000000bc,| sort | uniq -d ..... id 00000000000000bc, value 000000000195ac03, lost 0 ..... id 00000000000000bc, value 0000000002fd8b45, lost 0 ..... id 00000000000000bc, value 0000000005f0b1ce, lost 0 # It reads out the value of counter 0 (cycles) 85 times, but has only 3 different values. Anyway what does perf script print out? The value of the samples frequency? Where does perf record read out the value of the counter event? Any ideas where to start debugging? Thanks a lot. -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 [-- Attachment #2: perf.data.tmrs390 --] [-- Type: application/octet-stream, Size: 25888 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-29 7:37 ` Thomas Richter @ 2025-10-29 9:25 ` James Clark 2025-11-11 11:22 ` Thomas Richter 0 siblings, 1 reply; 15+ messages in thread From: James Clark @ 2025-10-29 9:25 UTC (permalink / raw) To: Thomas Richter, Anubhav Shelat Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi On 29/10/2025 7:37 am, Thomas Richter wrote: > On 10/28/25 16:23, James Clark wrote: >> >> >> On 28/10/2025 12:55 pm, Thomas Richter wrote: > ..... > >>> When I skip the grep it actually gets worse, there re more run away values: >>> # perf record -e "{cycles,cycles}:Su" -- perf test -w brstack >>> [ perf record: Woken up 2 times to write data ] >>> [ perf record: Captured and wrote 0.012 MB perf.data (50 samples) ] >>> # perf script | head -20 >>> perf 919810 6726.456179: 2754000 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) >>> perf 919810 6726.456179: 58638457 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) >>> perf 919810 6726.456182: 1377000 cycles: 3ff9560a696 check_match+0x76 (/usr/lib/ld64.so.1) >>> perf 919810 6726.456182: 1377000 cycles: 3ff9560fa6a _dl_relocate_object_no_relro+0x5fa (/usr/lib/ld64.so.1) >> >> Can you share the raw output for the second sample as well? Or even the whole file would be better. > > Ok I will append a perf.data from today and hopefully it will be delivered to you: > See attachment perf.data.tmrs390 (binary file, big endian from s390) > >> >> It's the addresses from this sample that are confusing. 0x3ff95608ec8 is the same for both counters on the first sample (correctly), but the second sample has 0x3ff9560a696 and 0x3ff9560fa6a even though the cycles counts are the same. >> > > Command > ./perf record -r 99 -e "{cycles,cycles}:Su" -- ./perf test -w brstack > is testing leadership group sampling in tests/shell/record.sh and > fails most of the time on s390. > > The command opens event cycles (as group leader) for sampling and the s390 > sampling facility is started with default frequency of 4000. > This can be seen in the debug output: > > perf record opening and mmapping events > Opening: cycles > ------------------------------------------------------------ > perf_event_attr: > type 0 (PERF_TYPE_HARDWARE) > size 136 > config 0 (PERF_COUNT_HW_CPU_CYCLES) > { sample_period, sample_freq } 4000 > sample_type IP|TID|TIME|READ|ID|PERIOD > read_format ID|GROUP|LOST > disabled 1 > inherit 1 > exclude_kernel 1 > exclude_hv 1 > mmap 1 > comm 1 > freq 1 > enable_on_exec 1 > task 1 > sample_id_all 1 > mmap2 1 > comm_exec 1 > ksymbol 1 > bpf_event 1 > build_id 1 > .... > Next event cycles is opened in the s390 counting facility: > Opening: cycles > perf_event_attr: > type 0 (PERF_TYPE_HARDWARE) > size 136 > config 0 (PERF_COUNT_HW_CPU_CYCLES) > sample_type IP|TID|TIME|READ|ID|PERIOD > read_format ID|GROUP|LOST > inherit 1 > exclude_kernel 1 > exclude_hv 1 > sample_id_all 1 > > So now there are 2 hardware events when are mapped on s390 to > 1. event handled by CPU Measurement sampling facility, hardware writes 32 byte > large samples to buffers. The frequency of 4000 Hz translates > to a sample every 1300000 instructions. Interrupt driven. > 2. event handled by CPU Measurement counting facilitly, hardware > runs in the background and increments counters accordingly. > All available counters (about 400) are running in the background > and read via assembler instruction until stopped. No interrupts. > > If I understand this setup correctly, the first event is the group > leader and either both events run and are active or non of them. > That is the reason why both values should be identitical > Is this true? > > Now given two independent CPU measurement units on s390, one running > in the background incrementing counters, the other interrupt driven > reading samples, there is always room the both counters to differ. > The question is how much and how often. > > When I look at the debug output of the perf.data file; I get this: > > 55805554120788 0x22a8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x2): > 14135/14135: 0x3ff9ae90340 period: 1300000 addr: 0 > ... sample_read: > .... group nr 2 > ..... id 00000000000000b4, value 000000000115b5c0, lost 0 > ..... id 00000000000000bc, value 000000000195ac03, lost 0 > ... thread: perf:14135 > ...... dso: /usr/lib/ld64.so.1 > > The first value is the count from the sampling event, it gets > incremented with 4000 Hz frequency: > > # perf report -D -i/tmp/perf.data.tmrs390|grep 00000000000000b4,|head -10 > ..... id 00000000000000b4, value 000000000101dfa0, lost 0 > ..... id 00000000000000b4, value 000000000115b5c0, lost 0 > ..... id 00000000000000b4, value 00000000013d6200, lost 0 > ..... id 00000000000000b4, value 0000000001513820, lost 0 > ..... id 00000000000000b4, value 0000000001650e40, lost 0 > ..... id 00000000000000b4, value 00000000018cba80, lost 0 > ..... id 00000000000000b4, value 0000000001a090a0, lost 0 > ..... id 00000000000000b4, value 0000000001b466c0, lost 0 > ..... id 00000000000000b4, value 0000000001c83ce0, lost 0 > ..... id 00000000000000b4, value 0000000001dc1300, lost 0 > > value 115b5c0 - 101dfa0 = 13d620 --> 1300000 period time. > So that value always increments by period time. > > The other counter id is: > # perf report -D -i/tmp/perf.data.tmrs390|grep 00000000000000bc,| sort | uniq -d > ..... id 00000000000000bc, value 000000000195ac03, lost 0 > ..... id 00000000000000bc, value 0000000002fd8b45, lost 0 > ..... id 00000000000000bc, value 0000000005f0b1ce, lost 0 > # > It reads out the value of counter 0 (cycles) 85 times, but has only 3 different > values. > > Anyway what does perf script print out? The value of the samples frequency? > Where does perf record read out the value of the counter event? > Any ideas where to start debugging? > > Thanks a lot. > I think your PMU is just broken when setup for sampling events. The second counter is not being reset to zero to begin with, and then it doesn't increment after the first sample. I don't think your issues are to do with virtualisation. The reason I was confused about the addresses in the Perf script output is because Perf discards samples where the counter didn't increment: /* * There's no reason to deliver sample * for zero period, bail out. */ if (!sample->period) return 0; I'm not really sure about the logic for this because it breaks the assumption that the Perf script output has one line for each counter. And counters not increasing is information in itself which should be displayed. If I comment out this line, then the output looks like I would expect and plainly shows your issue: perf 55805.554110: 15600000 cycles: 3ff9ae95194 (lib/ld64.so.1) perf 55805.554110: 26586115 cycles: 3ff9ae95194 (lib/ld64.so.1) perf 55805.554117: 1300000 cycles: 3ff9ae8bb3e (lib/ld64.so.1) perf 55805.554117: 0 cycles: 3ff9ae8bb3e (lib/ld64.so.1) perf 55805.554120: 1300000 cycles: 3ff9ae90340 (lib/ld64.so.1) perf 55805.554120: 0 cycles: 3ff9ae90340 (lib/ld64.so.1) Now we always have pairs of samples, and you can see that your second cycle counter isn't doing anything. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-29 9:25 ` James Clark @ 2025-11-11 11:22 ` Thomas Richter 2025-11-11 14:03 ` James Clark 0 siblings, 1 reply; 15+ messages in thread From: Thomas Richter @ 2025-11-11 11:22 UTC (permalink / raw) To: James Clark, Anubhav Shelat Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi On 10/29/25 10:25, James Clark wrote: > > > On 29/10/2025 7:37 am, Thomas Richter wrote: >> On 10/28/25 16:23, James Clark wrote: >>> >>> >>> On 28/10/2025 12:55 pm, Thomas Richter wrote: >> ..... >> >>>> When I skip the grep it actually gets worse, there re more run away values: >>>> # perf record -e "{cycles,cycles}:Su" -- perf test -w brstack >>>> [ perf record: Woken up 2 times to write data ] >>>> [ perf record: Captured and wrote 0.012 MB perf.data (50 samples) ] >>>> # perf script | head -20 >>>> perf 919810 6726.456179: 2754000 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) >>>> perf 919810 6726.456179: 58638457 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) >>>> perf 919810 6726.456182: 1377000 cycles: 3ff9560a696 check_match+0x76 (/usr/lib/ld64.so.1) >>>> perf 919810 6726.456182: 1377000 cycles: 3ff9560fa6a _dl_relocate_object_no_relro+0x5fa (/usr/lib/ld64.so.1) >>> >>> Can you share the raw output for the second sample as well? Or even the whole file would be better. >> >> Ok I will append a perf.data from today and hopefully it will be delivered to you: >> See attachment perf.data.tmrs390 (binary file, big endian from s390) >> >>> >>> It's the addresses from this sample that are confusing. 0x3ff95608ec8 is the same for both counters on the first sample (correctly), but the second sample has 0x3ff9560a696 and 0x3ff9560fa6a even though the cycles counts are the same. >>> >> >> Command >> ./perf record -r 99 -e "{cycles,cycles}:Su" -- ./perf test -w brstack >> is testing leadership group sampling in tests/shell/record.sh and >> fails most of the time on s390. >> >> The command opens event cycles (as group leader) for sampling and the s390 >> sampling facility is started with default frequency of 4000. >> This can be seen in the debug output: >> >> perf record opening and mmapping events >> Opening: cycles >> ------------------------------------------------------------ >> perf_event_attr: >> type 0 (PERF_TYPE_HARDWARE) >> size 136 >> config 0 (PERF_COUNT_HW_CPU_CYCLES) >> { sample_period, sample_freq } 4000 >> sample_type IP|TID|TIME|READ|ID|PERIOD >> read_format ID|GROUP|LOST >> disabled 1 >> inherit 1 >> exclude_kernel 1 >> exclude_hv 1 >> mmap 1 >> comm 1 >> freq 1 >> enable_on_exec 1 >> task 1 >> sample_id_all 1 >> mmap2 1 >> comm_exec 1 >> ksymbol 1 >> bpf_event 1 >> build_id 1 >> .... >> Next event cycles is opened in the s390 counting facility: >> Opening: cycles >> perf_event_attr: >> type 0 (PERF_TYPE_HARDWARE) >> size 136 >> config 0 (PERF_COUNT_HW_CPU_CYCLES) >> sample_type IP|TID|TIME|READ|ID|PERIOD >> read_format ID|GROUP|LOST >> inherit 1 >> exclude_kernel 1 >> exclude_hv 1 >> sample_id_all 1 >> >> So now there are 2 hardware events when are mapped on s390 to >> 1. event handled by CPU Measurement sampling facility, hardware writes 32 byte >> large samples to buffers. The frequency of 4000 Hz translates >> to a sample every 1300000 instructions. Interrupt driven. >> 2. event handled by CPU Measurement counting facilitly, hardware >> runs in the background and increments counters accordingly. >> All available counters (about 400) are running in the background >> and read via assembler instruction until stopped. No interrupts. >> >> If I understand this setup correctly, the first event is the group >> leader and either both events run and are active or non of them. >> That is the reason why both values should be identitical >> Is this true? >> >> Now given two independent CPU measurement units on s390, one running >> in the background incrementing counters, the other interrupt driven >> reading samples, there is always room the both counters to differ. >> The question is how much and how often. >> >> When I look at the debug output of the perf.data file; I get this: >> >> 55805554120788 0x22a8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x2): >> 14135/14135: 0x3ff9ae90340 period: 1300000 addr: 0 >> ... sample_read: >> .... group nr 2 >> ..... id 00000000000000b4, value 000000000115b5c0, lost 0 >> ..... id 00000000000000bc, value 000000000195ac03, lost 0 >> ... thread: perf:14135 >> ...... dso: /usr/lib/ld64.so.1 >> >> The first value is the count from the sampling event, it gets >> incremented with 4000 Hz frequency: >> >> # perf report -D -i/tmp/perf.data.tmrs390|grep 00000000000000b4,|head -10 >> ..... id 00000000000000b4, value 000000000101dfa0, lost 0 >> ..... id 00000000000000b4, value 000000000115b5c0, lost 0 >> ..... id 00000000000000b4, value 00000000013d6200, lost 0 >> ..... id 00000000000000b4, value 0000000001513820, lost 0 >> ..... id 00000000000000b4, value 0000000001650e40, lost 0 >> ..... id 00000000000000b4, value 00000000018cba80, lost 0 >> ..... id 00000000000000b4, value 0000000001a090a0, lost 0 >> ..... id 00000000000000b4, value 0000000001b466c0, lost 0 >> ..... id 00000000000000b4, value 0000000001c83ce0, lost 0 >> ..... id 00000000000000b4, value 0000000001dc1300, lost 0 >> >> value 115b5c0 - 101dfa0 = 13d620 --> 1300000 period time. >> So that value always increments by period time. >> >> The other counter id is: >> # perf report -D -i/tmp/perf.data.tmrs390|grep 00000000000000bc,| sort | uniq -d >> ..... id 00000000000000bc, value 000000000195ac03, lost 0 >> ..... id 00000000000000bc, value 0000000002fd8b45, lost 0 >> ..... id 00000000000000bc, value 0000000005f0b1ce, lost 0 >> # >> It reads out the value of counter 0 (cycles) 85 times, but has only 3 different >> values. >> >> Anyway what does perf script print out? The value of the samples frequency? >> Where does perf record read out the value of the counter event? >> Any ideas where to start debugging? >> >> Thanks a lot. >> > > I think your PMU is just broken when setup for sampling events. The second counter is not being reset to zero to begin with, and then it doesn't increment after the first sample. I don't think your issues are to do with virtualisation. > > The reason I was confused about the addresses in the Perf script output is because Perf discards samples where the counter didn't increment: > > /* > * There's no reason to deliver sample > * for zero period, bail out. > */ > if (!sample->period) > return 0; > > I'm not really sure about the logic for this because it breaks the assumption that the Perf script output has one line for each counter. And counters not increasing is information in itself which should be displayed. > > If I comment out this line, then the output looks like I would expect and plainly shows your issue: > > perf 55805.554110: 15600000 cycles: 3ff9ae95194 (lib/ld64.so.1) > perf 55805.554110: 26586115 cycles: 3ff9ae95194 (lib/ld64.so.1) > perf 55805.554117: 1300000 cycles: 3ff9ae8bb3e (lib/ld64.so.1) > perf 55805.554117: 0 cycles: 3ff9ae8bb3e (lib/ld64.so.1) > perf 55805.554120: 1300000 cycles: 3ff9ae90340 (lib/ld64.so.1) > perf 55805.554120: 0 cycles: 3ff9ae90340 (lib/ld64.so.1) > > Now we always have pairs of samples, and you can see that your second cycle counter isn't doing anything. > > James, the counter above being always 0 come from the fact that the group uses modifier :u for user space, the cycles event are invoked without :u modifier. On s390 you need to use -e "{cycles,cycles}:S" I disagree with your assessment completely. I have spent quite some time debugging this and come to a different finding. I executed this command on the latest kernel: # uname -a Linux b83lp65.lnxne.boe 6.18.0-rc4d-perf+ #84 SMP Fri Nov 7 09:24:44 CET 2025 s390x GNU/Linux # /root/linux/tools/perf/perf record -e "{cycles,cycles}:S" \ -- taskset -c 0 /root/linux/tools/perf/perf test -w brstack 5000000 2>/dev/null I also had your suggestion patched into the perf script code: # git diff diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c index 09af486c83e4..212bca671a49 100644 --- a/tools/perf/util/session.c +++ b/tools/perf/util/session.c @@ -1212,9 +1212,9 @@ static int deliver_sample_value(struct evlist *evlist, /* * There's no reason to deliver sample * for zero period, bail out. - */ if (!sample->period) return 0; + */ evsel = container_of(sid->evsel, struct evsel, core); return tool->sample(tool, event, sample, evsel, machine); # The output of perf script is now (always in pairs of 2 lines): taskset 7005 332.357459: 1377000 cycles: 3ff9db4611a find_module_idx+0x8a (/usr/lib64/libc.so.6) taskset 7005 332.357459: 1948445 cycles: 3ff9db4611a find_module_idx+0x8a (/usr/lib64/libc.so.6) perf 7005 332.368619: 1377000 cycles: 3ffe03a4fa6 xas_reload+0x36 ([kernel.kallsyms]) perf 7005 332.368619: 61344023 cycles: 3ffe03a4fa6 xas_reload+0x36 ([kernel.kallsyms]) perf 7005 332.368624: 1377000 cycles: 3ffe0b53624 mas_empty_area_rev+0x3c4 ([kernel.kallsyms]) perf 7005 332.368624: 25448 cycles: 3ffe0b53624 mas_empty_area_rev+0x3c4 ([kernel.kallsyms]) perf 7005 332.368626: 1377000 cycles: 3ffadf81606 _dl_catch_exception+0xb6 (/usr/lib/ld64.so.1) perf 7005 332.368626: 12182 cycles: 3ffadf81606 _dl_catch_exception+0xb6 (/usr/lib/ld64.so.1) perf 7005 332.368628: 1377000 cycles: 3ffadf8c9b2 _dl_add_to_namespace_list+0x42 (/usr/lib/ld64.so.1) perf 7005 332.368628: 11392 cycles: 3ffadf8c9b2 _dl_add_to_namespace_list+0x42 (/usr/lib/ld64.so.1) perf 7005 332.368630: 1377000 cycles: 3ffe0b4f800 mas_prev+0x0 ([kernel.kallsyms]) perf 7005 332.368630: 11476 cycles: 3ffe0b4f800 mas_prev+0x0 ([kernel.kallsyms]) Now to the debugging and investigation: 1. With command perf record -e '{cycles,cycles}:S' -- .... the first cycles event start sampling. On s390 this sets up sampling with a frequency of 4000 Hz. This translates to hardware sample rate of 1377000 instructions per micro-second to meet a frequency of 4000 HZ. 2. With first event cycles now sampling into a hardware buffer, an interrupt is triggered each time a sampling buffer gets full. The interrupt handler is then invoked and debug output shows the processing of samples. The size of one hardware sample is 32 bytes. With an interrupt triggered when the hardware buffer page of 4KB gets full, the interrupt handler processes 128 samples. (This is taken from s390 specific fast debug data gathering) 2025-11-07 14:35:51.977248 000003ffe013cbfa perf_event_count_update event->count 0x0 count 0x1502e8 2025-11-07 14:35:51.977248 000003ffe013cbfa perf_event_count_update event->count 0x1502e8 count 0x1502e8 2025-11-07 14:35:51.977248 000003ffe013cbfa perf_event_count_update event->count 0x2a05d0 count 0x1502e8 2025-11-07 14:35:51.977252 000003ffe013cbfa perf_event_count_update event->count 0x3f08b8 count 0x1502e8 2025-11-07 14:35:51.977252 000003ffe013cbfa perf_event_count_update event->count 0x540ba0 count 0x1502e8 2025-11-07 14:35:51.977253 000003ffe013cbfa perf_event_count_update event->count 0x690e88 count 0x1502e8 2025-11-07 14:35:51.977254 000003ffe013cbfa perf_event_count_update event->count 0x7e1170 count 0x1502e8 2025-11-07 14:35:51.977254 000003ffe013cbfa perf_event_count_update event->count 0x931458 count 0x1502e8 2025-11-07 14:35:51.977254 000003ffe013cbfa perf_event_count_update event->count 0xa81740 count 0x1502e8 3. The value is constantly increasing by the number of instructions executed to generate a sample entry. This is the first line of the pairs of lines. count 0x1502e8 --> 1377000 # perf script | grep 1377000 | wc -l 214 # perf script | wc -l 428 # That is 428 lines in total, and half of the lines contain value 1377000. 4. The second event cycles is opened against the counting PMU, which is an independent PMU and is not interrupt driven. Once enabled it runs in the backgroud and keeps running, incrementing silently about 400+ counters. The counter values are read via assembly instructions. This second counter PMU's read call back function is called when the interrupt handler of the sampling facility processes each sample. The function call sequence is: perf_event_overflow() +--> __perf_event_overflow() +--> __perf_event_output() +--> perf_output_sample() +--> perf_output_read() +--> perf_output_read_group() for_each_sibling_event(sub, leader) { values[n++] = perf_event_count(sub, self); printk("%s sub %p values %#lx\n", __func__, sub, values[n-1]); } The last function perf_event_count() is invoked on the second event cylces *on* the counting PMU. An added printk statement shows the following lines in the dmesg output: # dmesg|grep perf_output_read_group |head -10 [ 332.368620] perf_output_read_group sub 00000000d80b7c1f values 0x3a80917 (1) [ 332.368624] perf_output_read_group sub 00000000d80b7c1f values 0x3a86c7f (2) [ 332.368627] perf_output_read_group sub 00000000d80b7c1f values 0x3a89c15 (3) [ 332.368629] perf_output_read_group sub 00000000d80b7c1f values 0x3a8c895 (4) [ 332.368631] perf_output_read_group sub 00000000d80b7c1f values 0x3a8f569 (5) [ 332.368633] perf_output_read_group sub 00000000d80b7c1f values 0x3a9204b [ 332.368635] perf_output_read_group sub 00000000d80b7c1f values 0x3a94790 [ 332.368637] perf_output_read_group sub 00000000d80b7c1f values 0x3a9704b [ 332.368638] perf_output_read_group sub 00000000d80b7c1f values 0x3a99888 # This correlates with the output of # perf report -D | grep 'id 00000000000000'|head -10 ..... id 0000000000000006, value 00000000001502e8, lost 0 ..... id 000000000000000e, value 0000000003a80917, lost 0 --> line (1) above ..... id 0000000000000006, value 00000000002a05d0, lost 0 ..... id 000000000000000e, value 0000000003a86c7f, lost 0 --> line (2) above ..... id 0000000000000006, value 00000000003f08b8, lost 0 ..... id 000000000000000e, value 0000000003a89c15, lost 0 --> line (3) above ..... id 0000000000000006, value 0000000000540ba0, lost 0 ..... id 000000000000000e, value 0000000003a8c895, lost 0 --> line (4) above ..... id 0000000000000006, value 0000000000690e88, lost 0 ..... id 000000000000000e, value 0000000003a8f569, lost 0 --> line (5) above Summary: - Above command starts the CPU sampling facility, with runs interrupt driven when a 4KB page is full. An interrupt processes the 128 samples and calls eventually perf_output_read_group() for each sample to save it in the event's ring buffer. - At that time the CPU counting facility is invoked to read the value of the event cycles. This value is saved as the second value in the sample_read structure. - The first and odd lines in the perf script output displays the period value between 2 samples being created by hardware. It is the number of instructions executes before the hardware writes a sample. - The second and even lines in the perf script output displays the number of CPU cycles needed to process each sample and save it in the event's ring buffer. These 2 different values can never be identical on s390. In fact I think this test case is not valid for s390. Or we have to compare the odd numbered lines to each other (1 with 3, 3 with 5 and so on) and the even lines (2 with 4, 4 with 6 and so on) to each other. Any ideas or comments? Thanks a lot. -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-11-11 11:22 ` Thomas Richter @ 2025-11-11 14:03 ` James Clark 2025-11-12 10:47 ` Thomas Richter 0 siblings, 1 reply; 15+ messages in thread From: James Clark @ 2025-11-11 14:03 UTC (permalink / raw) To: Thomas Richter, Anubhav Shelat Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi On 11/11/2025 11:22 am, Thomas Richter wrote: > On 10/29/25 10:25, James Clark wrote: >> >> >> On 29/10/2025 7:37 am, Thomas Richter wrote: >>> On 10/28/25 16:23, James Clark wrote: >>>> >>>> >>>> On 28/10/2025 12:55 pm, Thomas Richter wrote: >>> ..... >>> >>>>> When I skip the grep it actually gets worse, there re more run away values: >>>>> # perf record -e "{cycles,cycles}:Su" -- perf test -w brstack >>>>> [ perf record: Woken up 2 times to write data ] >>>>> [ perf record: Captured and wrote 0.012 MB perf.data (50 samples) ] >>>>> # perf script | head -20 >>>>> perf 919810 6726.456179: 2754000 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) >>>>> perf 919810 6726.456179: 58638457 cycles: 3ff95608ec8 _dl_map_object_from_fd+0xb18 (/usr/lib/ld64.so.1) >>>>> perf 919810 6726.456182: 1377000 cycles: 3ff9560a696 check_match+0x76 (/usr/lib/ld64.so.1) >>>>> perf 919810 6726.456182: 1377000 cycles: 3ff9560fa6a _dl_relocate_object_no_relro+0x5fa (/usr/lib/ld64.so.1) >>>> >>>> Can you share the raw output for the second sample as well? Or even the whole file would be better. >>> >>> Ok I will append a perf.data from today and hopefully it will be delivered to you: >>> See attachment perf.data.tmrs390 (binary file, big endian from s390) >>> >>>> >>>> It's the addresses from this sample that are confusing. 0x3ff95608ec8 is the same for both counters on the first sample (correctly), but the second sample has 0x3ff9560a696 and 0x3ff9560fa6a even though the cycles counts are the same. >>>> >>> >>> Command >>> ./perf record -r 99 -e "{cycles,cycles}:Su" -- ./perf test -w brstack >>> is testing leadership group sampling in tests/shell/record.sh and >>> fails most of the time on s390. >>> >>> The command opens event cycles (as group leader) for sampling and the s390 >>> sampling facility is started with default frequency of 4000. >>> This can be seen in the debug output: >>> >>> perf record opening and mmapping events >>> Opening: cycles >>> ------------------------------------------------------------ >>> perf_event_attr: >>> type 0 (PERF_TYPE_HARDWARE) >>> size 136 >>> config 0 (PERF_COUNT_HW_CPU_CYCLES) >>> { sample_period, sample_freq } 4000 >>> sample_type IP|TID|TIME|READ|ID|PERIOD >>> read_format ID|GROUP|LOST >>> disabled 1 >>> inherit 1 >>> exclude_kernel 1 >>> exclude_hv 1 >>> mmap 1 >>> comm 1 >>> freq 1 >>> enable_on_exec 1 >>> task 1 >>> sample_id_all 1 >>> mmap2 1 >>> comm_exec 1 >>> ksymbol 1 >>> bpf_event 1 >>> build_id 1 >>> .... >>> Next event cycles is opened in the s390 counting facility: >>> Opening: cycles >>> perf_event_attr: >>> type 0 (PERF_TYPE_HARDWARE) >>> size 136 >>> config 0 (PERF_COUNT_HW_CPU_CYCLES) >>> sample_type IP|TID|TIME|READ|ID|PERIOD >>> read_format ID|GROUP|LOST >>> inherit 1 >>> exclude_kernel 1 >>> exclude_hv 1 >>> sample_id_all 1 >>> >>> So now there are 2 hardware events when are mapped on s390 to >>> 1. event handled by CPU Measurement sampling facility, hardware writes 32 byte >>> large samples to buffers. The frequency of 4000 Hz translates >>> to a sample every 1300000 instructions. Interrupt driven. >>> 2. event handled by CPU Measurement counting facilitly, hardware >>> runs in the background and increments counters accordingly. >>> All available counters (about 400) are running in the background >>> and read via assembler instruction until stopped. No interrupts. >>> >>> If I understand this setup correctly, the first event is the group >>> leader and either both events run and are active or non of them. >>> That is the reason why both values should be identitical >>> Is this true? >>> >>> Now given two independent CPU measurement units on s390, one running >>> in the background incrementing counters, the other interrupt driven >>> reading samples, there is always room the both counters to differ. >>> The question is how much and how often. >>> >>> When I look at the debug output of the perf.data file; I get this: >>> >>> 55805554120788 0x22a8 [0x68]: PERF_RECORD_SAMPLE(IP, 0x2): >>> 14135/14135: 0x3ff9ae90340 period: 1300000 addr: 0 >>> ... sample_read: >>> .... group nr 2 >>> ..... id 00000000000000b4, value 000000000115b5c0, lost 0 >>> ..... id 00000000000000bc, value 000000000195ac03, lost 0 >>> ... thread: perf:14135 >>> ...... dso: /usr/lib/ld64.so.1 >>> >>> The first value is the count from the sampling event, it gets >>> incremented with 4000 Hz frequency: >>> >>> # perf report -D -i/tmp/perf.data.tmrs390|grep 00000000000000b4,|head -10 >>> ..... id 00000000000000b4, value 000000000101dfa0, lost 0 >>> ..... id 00000000000000b4, value 000000000115b5c0, lost 0 >>> ..... id 00000000000000b4, value 00000000013d6200, lost 0 >>> ..... id 00000000000000b4, value 0000000001513820, lost 0 >>> ..... id 00000000000000b4, value 0000000001650e40, lost 0 >>> ..... id 00000000000000b4, value 00000000018cba80, lost 0 >>> ..... id 00000000000000b4, value 0000000001a090a0, lost 0 >>> ..... id 00000000000000b4, value 0000000001b466c0, lost 0 >>> ..... id 00000000000000b4, value 0000000001c83ce0, lost 0 >>> ..... id 00000000000000b4, value 0000000001dc1300, lost 0 >>> >>> value 115b5c0 - 101dfa0 = 13d620 --> 1300000 period time. >>> So that value always increments by period time. >>> >>> The other counter id is: >>> # perf report -D -i/tmp/perf.data.tmrs390|grep 00000000000000bc,| sort | uniq -d >>> ..... id 00000000000000bc, value 000000000195ac03, lost 0 >>> ..... id 00000000000000bc, value 0000000002fd8b45, lost 0 >>> ..... id 00000000000000bc, value 0000000005f0b1ce, lost 0 >>> # >>> It reads out the value of counter 0 (cycles) 85 times, but has only 3 different >>> values. >>> >>> Anyway what does perf script print out? The value of the samples frequency? >>> Where does perf record read out the value of the counter event? >>> Any ideas where to start debugging? >>> >>> Thanks a lot. >>> >> >> I think your PMU is just broken when setup for sampling events. The second counter is not being reset to zero to begin with, and then it doesn't increment after the first sample. I don't think your issues are to do with virtualisation. >> >> The reason I was confused about the addresses in the Perf script output is because Perf discards samples where the counter didn't increment: >> >> /* >> * There's no reason to deliver sample >> * for zero period, bail out. >> */ >> if (!sample->period) >> return 0; >> >> I'm not really sure about the logic for this because it breaks the assumption that the Perf script output has one line for each counter. And counters not increasing is information in itself which should be displayed. >> >> If I comment out this line, then the output looks like I would expect and plainly shows your issue: >> >> perf 55805.554110: 15600000 cycles: 3ff9ae95194 (lib/ld64.so.1) >> perf 55805.554110: 26586115 cycles: 3ff9ae95194 (lib/ld64.so.1) >> perf 55805.554117: 1300000 cycles: 3ff9ae8bb3e (lib/ld64.so.1) >> perf 55805.554117: 0 cycles: 3ff9ae8bb3e (lib/ld64.so.1) >> perf 55805.554120: 1300000 cycles: 3ff9ae90340 (lib/ld64.so.1) >> perf 55805.554120: 0 cycles: 3ff9ae90340 (lib/ld64.so.1) >> >> Now we always have pairs of samples, and you can see that your second cycle counter isn't doing anything. >> >> > > James, > the counter above being always 0 come from the fact that the group uses modifier :u > for user space, the cycles event are invoked without :u modifier. > On s390 you need to use -e "{cycles,cycles}:S" I'm not following why :u should make a difference. I see from your explanation below that you have two different types of counter and the first event runs on the buffered one and the other on a simple PMU counter, but how does that translate to userspace filtering making a difference? > > I disagree with your assessment completely. I have spent quite some time > debugging this and come to a different finding. I executed this command > on the latest kernel: > > # uname -a > Linux b83lp65.lnxne.boe 6.18.0-rc4d-perf+ #84 SMP Fri Nov 7 09:24:44 CET 2025 s390x GNU/Linux > # /root/linux/tools/perf/perf record -e "{cycles,cycles}:S" \ > -- taskset -c 0 /root/linux/tools/perf/perf test -w brstack 5000000 2>/dev/null > > I also had your suggestion patched into the perf script code: > # git diff > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c > index 09af486c83e4..212bca671a49 100644 > --- a/tools/perf/util/session.c > +++ b/tools/perf/util/session.c > @@ -1212,9 +1212,9 @@ static int deliver_sample_value(struct evlist *evlist, > /* > * There's no reason to deliver sample > * for zero period, bail out. > - */ > if (!sample->period) > return 0; > + */ > > evsel = container_of(sid->evsel, struct evsel, core); > return tool->sample(tool, event, sample, evsel, machine); > # > > The output of perf script is now (always in pairs of 2 lines): > > taskset 7005 332.357459: 1377000 cycles: 3ff9db4611a find_module_idx+0x8a (/usr/lib64/libc.so.6) > taskset 7005 332.357459: 1948445 cycles: 3ff9db4611a find_module_idx+0x8a (/usr/lib64/libc.so.6) > perf 7005 332.368619: 1377000 cycles: 3ffe03a4fa6 xas_reload+0x36 ([kernel.kallsyms]) > perf 7005 332.368619: 61344023 cycles: 3ffe03a4fa6 xas_reload+0x36 ([kernel.kallsyms]) > perf 7005 332.368624: 1377000 cycles: 3ffe0b53624 mas_empty_area_rev+0x3c4 ([kernel.kallsyms]) > perf 7005 332.368624: 25448 cycles: 3ffe0b53624 mas_empty_area_rev+0x3c4 ([kernel.kallsyms]) > perf 7005 332.368626: 1377000 cycles: 3ffadf81606 _dl_catch_exception+0xb6 (/usr/lib/ld64.so.1) > perf 7005 332.368626: 12182 cycles: 3ffadf81606 _dl_catch_exception+0xb6 (/usr/lib/ld64.so.1) > perf 7005 332.368628: 1377000 cycles: 3ffadf8c9b2 _dl_add_to_namespace_list+0x42 (/usr/lib/ld64.so.1) > perf 7005 332.368628: 11392 cycles: 3ffadf8c9b2 _dl_add_to_namespace_list+0x42 (/usr/lib/ld64.so.1) > perf 7005 332.368630: 1377000 cycles: 3ffe0b4f800 mas_prev+0x0 ([kernel.kallsyms]) > perf 7005 332.368630: 11476 cycles: 3ffe0b4f800 mas_prev+0x0 ([kernel.kallsyms]) > The above patch won't make a difference to this output because none of them are 0. You get exactly the same output with or without the patch. I only showed that this part of the code was hiding the original issue where most of the second counter values were 0. That seems to not be the case if you remove :u as you found, but isn't related to this part. I kind of see why :u results in 0 counts, because you're only counting the time between when the buffered counter fills up and interrupted and when you generated the other samples. But I don't get why that has to be the case. Couldn't you leave the other counters running and then they would count everything between samples? > > Now to the debugging and investigation: > 1. With command > perf record -e '{cycles,cycles}:S' -- .... > the first cycles event start sampling. > On s390 this sets up sampling with a frequency of 4000 Hz. > This translates to hardware sample rate of 1377000 instructions per > micro-second to meet a frequency of 4000 HZ. 1377000 instructions per sample? I'm assuming per micro-second is a mistake. As in, it sets the counter to roll over and take a sample every 1377000 cycles (instructions?), which ends up being 4000 samples per second. > > 2. With first event cycles now sampling into a hardware buffer, an interrupt > is triggered each time a sampling buffer gets full. The interrupt handler > is then invoked and debug output shows the processing of samples. > The size of one hardware sample is 32 bytes. With an interrupt triggered > when the hardware buffer page of 4KB gets full, the interrupt handler > processes 128 samples. > (This is taken from s390 specific fast debug data gathering) > 2025-11-07 14:35:51.977248 000003ffe013cbfa perf_event_count_update event->count 0x0 count 0x1502e8 > 2025-11-07 14:35:51.977248 000003ffe013cbfa perf_event_count_update event->count 0x1502e8 count 0x1502e8 > 2025-11-07 14:35:51.977248 000003ffe013cbfa perf_event_count_update event->count 0x2a05d0 count 0x1502e8 > 2025-11-07 14:35:51.977252 000003ffe013cbfa perf_event_count_update event->count 0x3f08b8 count 0x1502e8 > 2025-11-07 14:35:51.977252 000003ffe013cbfa perf_event_count_update event->count 0x540ba0 count 0x1502e8 > 2025-11-07 14:35:51.977253 000003ffe013cbfa perf_event_count_update event->count 0x690e88 count 0x1502e8 > 2025-11-07 14:35:51.977254 000003ffe013cbfa perf_event_count_update event->count 0x7e1170 count 0x1502e8 > 2025-11-07 14:35:51.977254 000003ffe013cbfa perf_event_count_update event->count 0x931458 count 0x1502e8 > 2025-11-07 14:35:51.977254 000003ffe013cbfa perf_event_count_update event->count 0xa81740 count 0x1502e8 > > 3. The value is constantly increasing by the number of instructions executed > to generate a sample entry. This is the first line of the pairs of lines. > count 0x1502e8 --> 1377000 > > # perf script | grep 1377000 | wc -l > 214 > # perf script | wc -l > 428 > # > That is 428 lines in total, and half of the lines contain value 1377000. > > 4. The second event cycles is opened against the counting PMU, which is an > independent PMU and is not interrupt driven. Once enabled it runs in the > backgroud and keeps running, incrementing silently about 400+ > counters. The counter values are read via assembly instructions. > > This second counter PMU's read call back function is called when the > interrupt handler of the sampling facility processes each sample. The > function call sequence is: > > perf_event_overflow() > +--> __perf_event_overflow() > +--> __perf_event_output() > +--> perf_output_sample() > +--> perf_output_read() > +--> perf_output_read_group() > for_each_sibling_event(sub, leader) { > values[n++] = perf_event_count(sub, self); > printk("%s sub %p values %#lx\n", __func__, sub, values[n-1]); > } > > The last function perf_event_count() is invoked on the second event cylces > *on* the counting PMU. An added printk statement shows the following lines > in the dmesg output: > > # dmesg|grep perf_output_read_group |head -10 > [ 332.368620] perf_output_read_group sub 00000000d80b7c1f values 0x3a80917 (1) > [ 332.368624] perf_output_read_group sub 00000000d80b7c1f values 0x3a86c7f (2) > [ 332.368627] perf_output_read_group sub 00000000d80b7c1f values 0x3a89c15 (3) > [ 332.368629] perf_output_read_group sub 00000000d80b7c1f values 0x3a8c895 (4) > [ 332.368631] perf_output_read_group sub 00000000d80b7c1f values 0x3a8f569 (5) > [ 332.368633] perf_output_read_group sub 00000000d80b7c1f values 0x3a9204b > [ 332.368635] perf_output_read_group sub 00000000d80b7c1f values 0x3a94790 > [ 332.368637] perf_output_read_group sub 00000000d80b7c1f values 0x3a9704b > [ 332.368638] perf_output_read_group sub 00000000d80b7c1f values 0x3a99888 > # > > This correlates with the output of > # perf report -D | grep 'id 00000000000000'|head -10 > ..... id 0000000000000006, value 00000000001502e8, lost 0 > ..... id 000000000000000e, value 0000000003a80917, lost 0 --> line (1) above > ..... id 0000000000000006, value 00000000002a05d0, lost 0 > ..... id 000000000000000e, value 0000000003a86c7f, lost 0 --> line (2) above > ..... id 0000000000000006, value 00000000003f08b8, lost 0 > ..... id 000000000000000e, value 0000000003a89c15, lost 0 --> line (3) above > ..... id 0000000000000006, value 0000000000540ba0, lost 0 > ..... id 000000000000000e, value 0000000003a8c895, lost 0 --> line (4) above > ..... id 0000000000000006, value 0000000000690e88, lost 0 > ..... id 000000000000000e, value 0000000003a8f569, lost 0 --> line (5) above > > Summary: > - Above command starts the CPU sampling facility, with runs interrupt > driven when a 4KB page is full. An interrupt processes the 128 samples > and calls eventually perf_output_read_group() for each sample to save it > in the event's ring buffer. > > - At that time the CPU counting facility is invoked to read the value of > the event cycles. This value is saved as the second value in the > sample_read structure. > > - The first and odd lines in the perf script output displays the period > value between 2 samples being created by hardware. It is the number > of instructions executes before the hardware writes a sample. > > - The second and even lines in the perf script output displays the number > of CPU cycles needed to process each sample and save it in the event's ring > buffer. Thanks for the explanation, I understand now. But this particular point stands out to me. When set up for PERF_SAMPLE_READ, the values of the counters returned shouldn't be the number of cycles required to process each sample. They should be whatever value that the counters were when the group leader took its sample. The counts should include everything from between when the last sample was taken to now. It sounds like you can't honor that with your buffered cycles counter being the group leader anyway, so you shouldn't allow sample read events to be opened on it. If the hardware counter is just doing its own thing writing its own samples into a buffer, without any interaction with the other counters then it's not really doing what the sample read feature intends. I think pretending that it can do it but returning confusing data is worse than just failing to open to begin with. I know there's always going to be some slight differences in architectures and sometimes you have to fit things into the existing ABI, but this difference seems a bit too big to call it sample read at all. > > These 2 different values can never be identical on s390. > In fact I think this test case is not valid for s390. Or we have to > compare the odd numbered lines to each other (1 with 3, 3 with 5 and so on) > and the even lines (2 with 4, 4 with 6 and so on) to each other. > > Any ideas or comments? > > > Thanks a lot. I agree, it looks like the test in its current form isn't valid and shouldn't be run on s390. Does this buffered hardware counter only support the cycles event? Or does it support any event but there is only one instance of it? Because if it only supports the cycles event then we can just change the test to count branches which would force it to run on a simple counter instead. If it supports any event then I think the test (or sample read as a whole) can never work. You'd still probably want to update the PMU to reject sample read events that run on the buffered counter, whether it's only for cycles or not. That would prevent people from being confused outside of the test as well. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-11-11 14:03 ` James Clark @ 2025-11-12 10:47 ` Thomas Richter 0 siblings, 0 replies; 15+ messages in thread From: Thomas Richter @ 2025-11-12 10:47 UTC (permalink / raw) To: James Clark, Anubhav Shelat Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi On 11/11/25 15:03, James Clark wrote: ..... >> >> James, >> the counter above being always 0 come from the fact that the group uses modifier :u >> for user space, the cycles event are invoked without :u modifier. >> On s390 you need to use -e "{cycles,cycles}:S" > > I'm not following why :u should make a difference. I see from your explanation below that you have two different types of counter and the first event runs on the buffered one and the other on a simple PMU counter, but how does that translate to userspace filtering making a difference? > Well, s390 counters are # ls -l /sys/devices/cpum_cf/events/*CPU_CYCLES -r--r--r-- 1 root root 4096 Nov 12 11:10 /sys/devices/cpum_cf/events/CPU_CYCLES -r--r--r-- 1 root root 4096 Nov 12 11:10 /sys/devices/cpum_cf/events/PROBLEM_STATE_CPU_CYCLES # cat /sys/devices/cpum_cf/events/*CPU_CYCLES event=0x0000 event=0x0020 # The generic event cycles is translated to CPU_CYCLES and user space cycles are counted with a different counter named PROBLEM_STATE_CPU_CYCLES So 2 different counters, in 2 different counter sets, one started one not started. >> I disagree with your assessment completely. I have spent quite some time >> debugging this and come to a different finding. I executed this command >> on the latest kernel: >> >> # uname -a >> Linux b83lp65.lnxne.boe 6.18.0-rc4d-perf+ #84 SMP Fri Nov 7 09:24:44 CET 2025 s390x GNU/Linux >> # /root/linux/tools/perf/perf record -e "{cycles,cycles}:S" \ >> -- taskset -c 0 /root/linux/tools/perf/perf test -w brstack 5000000 2>/dev/null >> >> I also had your suggestion patched into the perf script code: >> # git diff >> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c >> index 09af486c83e4..212bca671a49 100644 >> --- a/tools/perf/util/session.c >> +++ b/tools/perf/util/session.c >> @@ -1212,9 +1212,9 @@ static int deliver_sample_value(struct evlist *evlist, >> /* >> * There's no reason to deliver sample >> * for zero period, bail out. >> - */ >> if (!sample->period) >> return 0; >> + */ >> >> evsel = container_of(sid->evsel, struct evsel, core); >> return tool->sample(tool, event, sample, evsel, machine); >> # >> >> The output of perf script is now (always in pairs of 2 lines): >> > taskset 7005 332.357459: 1377000 cycles: 3ff9db4611a find_module_idx+0x8a (/usr/lib64/libc.so.6) >> taskset 7005 332.357459: 1948445 cycles: 3ff9db4611a find_module_idx+0x8a (/usr/lib64/libc.so.6) >> perf 7005 332.368619: 1377000 cycles: 3ffe03a4fa6 xas_reload+0x36 ([kernel.kallsyms]) >> perf 7005 332.368619: 61344023 cycles: 3ffe03a4fa6 xas_reload+0x36 ([kernel.kallsyms]) >> perf 7005 332.368624: 1377000 cycles: 3ffe0b53624 mas_empty_area_rev+0x3c4 ([kernel.kallsyms]) >> perf 7005 332.368624: 25448 cycles: 3ffe0b53624 mas_empty_area_rev+0x3c4 ([kernel.kallsyms]) >> perf 7005 332.368626: 1377000 cycles: 3ffadf81606 _dl_catch_exception+0xb6 (/usr/lib/ld64.so.1) >> perf 7005 332.368626: 12182 cycles: 3ffadf81606 _dl_catch_exception+0xb6 (/usr/lib/ld64.so.1) >> perf 7005 332.368628: 1377000 cycles: 3ffadf8c9b2 _dl_add_to_namespace_list+0x42 (/usr/lib/ld64.so.1) >> perf 7005 332.368628: 11392 cycles: 3ffadf8c9b2 _dl_add_to_namespace_list+0x42 (/usr/lib/ld64.so.1) >> perf 7005 332.368630: 1377000 cycles: 3ffe0b4f800 mas_prev+0x0 ([kernel.kallsyms]) >> perf 7005 332.368630: 11476 cycles: 3ffe0b4f800 mas_prev+0x0 ([kernel.kallsyms]) >> > > The above patch won't make a difference to this output because none of them are 0. You get exactly the same output with or without the patch. > > I only showed that this part of the code was hiding the original issue where most of the second counter values were 0. That seems to not be the case if you remove :u as you found, but isn't related to this part. > Agreed > I kind of see why :u results in 0 counts, because you're only counting the time between when the buffered counter fills up and interrupted and when you generated the other samples. But I don't get why that has to be the case. Couldn't you leave the other counters running and then they would count everything between samples? > >> >> Now to the debugging and investigation: >> 1. With command >> perf record -e '{cycles,cycles}:S' -- .... >> the first cycles event start sampling. >> On s390 this sets up sampling with a frequency of 4000 Hz. >> This translates to hardware sample rate of 1377000 instructions per >> micro-second to meet a frequency of 4000 HZ. > > 1377000 instructions per sample? I'm assuming per micro-second is a mistake. Here is the maths: # cat /proc/service_levels CPU-MF: Counter facility: version=3.8 authorization=002f CPU-MF: Sampling facility: min_rate=22032 max_rate=180464112 cpu_speed=5508 CPU-MF: Sampling facility: mode=basic sample_size=32 CPU-MF: Sampling facility: mode=diagnostic sample_size=171 ... So the cpu_speed of 5508 (machine type z17) means 5508 CPU cycles per micro second. Document SA23-2260-08 page 2-35 5508 * 1000000 = 5508000000b / 4000 --> 1377000 CPU cycles between hardware write a sample. > > As in, it sets the counter to roll over and take a sample every 1377000 cycles (instructions?), which ends up being 4000 samples per second. > >> >> 2. With first event cycles now sampling into a hardware buffer, an interrupt >> is triggered each time a sampling buffer gets full. The interrupt handler >> is then invoked and debug output shows the processing of samples. >> The size of one hardware sample is 32 bytes. With an interrupt triggered >> when the hardware buffer page of 4KB gets full, the interrupt handler >> processes 128 samples. >> (This is taken from s390 specific fast debug data gathering) >> 2025-11-07 14:35:51.977248 000003ffe013cbfa perf_event_count_update event->count 0x0 count 0x1502e8 >> 2025-11-07 14:35:51.977248 000003ffe013cbfa perf_event_count_update event->count 0x1502e8 count 0x1502e8 >> 2025-11-07 14:35:51.977248 000003ffe013cbfa perf_event_count_update event->count 0x2a05d0 count 0x1502e8 >> 2025-11-07 14:35:51.977252 000003ffe013cbfa perf_event_count_update event->count 0x3f08b8 count 0x1502e8 >> 2025-11-07 14:35:51.977252 000003ffe013cbfa perf_event_count_update event->count 0x540ba0 count 0x1502e8 >> 2025-11-07 14:35:51.977253 000003ffe013cbfa perf_event_count_update event->count 0x690e88 count 0x1502e8 >> 2025-11-07 14:35:51.977254 000003ffe013cbfa perf_event_count_update event->count 0x7e1170 count 0x1502e8 >> 2025-11-07 14:35:51.977254 000003ffe013cbfa perf_event_count_update event->count 0x931458 count 0x1502e8 >> 2025-11-07 14:35:51.977254 000003ffe013cbfa perf_event_count_update event->count 0xa81740 count 0x1502e8 >> >> 3. The value is constantly increasing by the number of instructions executed >> to generate a sample entry. This is the first line of the pairs of lines. >> count 0x1502e8 --> 1377000 >> >> # perf script | grep 1377000 | wc -l >> 214 >> # perf script | wc -l >> 428 >> # >> That is 428 lines in total, and half of the lines contain value 1377000. >> >> 4. The second event cycles is opened against the counting PMU, which is an >> independent PMU and is not interrupt driven. Once enabled it runs in the >> backgroud and keeps running, incrementing silently about 400+ >> counters. The counter values are read via assembly instructions. >> >> This second counter PMU's read call back function is called when the >> interrupt handler of the sampling facility processes each sample. The >> function call sequence is: >> >> perf_event_overflow() >> +--> __perf_event_overflow() >> +--> __perf_event_output() >> +--> perf_output_sample() >> +--> perf_output_read() >> +--> perf_output_read_group() >> for_each_sibling_event(sub, leader) { >> values[n++] = perf_event_count(sub, self); >> printk("%s sub %p values %#lx\n", __func__, sub, values[n-1]); >> } >> >> The last function perf_event_count() is invoked on the second event cylces >> *on* the counting PMU. An added printk statement shows the following lines >> in the dmesg output: >> >> # dmesg|grep perf_output_read_group |head -10 >> [ 332.368620] perf_output_read_group sub 00000000d80b7c1f values 0x3a80917 (1) >> [ 332.368624] perf_output_read_group sub 00000000d80b7c1f values 0x3a86c7f (2) >> [ 332.368627] perf_output_read_group sub 00000000d80b7c1f values 0x3a89c15 (3) >> [ 332.368629] perf_output_read_group sub 00000000d80b7c1f values 0x3a8c895 (4) >> [ 332.368631] perf_output_read_group sub 00000000d80b7c1f values 0x3a8f569 (5) >> [ 332.368633] perf_output_read_group sub 00000000d80b7c1f values 0x3a9204b >> [ 332.368635] perf_output_read_group sub 00000000d80b7c1f values 0x3a94790 >> [ 332.368637] perf_output_read_group sub 00000000d80b7c1f values 0x3a9704b >> [ 332.368638] perf_output_read_group sub 00000000d80b7c1f values 0x3a99888 >> # >> >> This correlates with the output of >> # perf report -D | grep 'id 00000000000000'|head -10 >> ..... id 0000000000000006, value 00000000001502e8, lost 0 >> ..... id 000000000000000e, value 0000000003a80917, lost 0 --> line (1) above >> ..... id 0000000000000006, value 00000000002a05d0, lost 0 >> ..... id 000000000000000e, value 0000000003a86c7f, lost 0 --> line (2) above >> ..... id 0000000000000006, value 00000000003f08b8, lost 0 >> ..... id 000000000000000e, value 0000000003a89c15, lost 0 --> line (3) above >> ..... id 0000000000000006, value 0000000000540ba0, lost 0 >> ..... id 000000000000000e, value 0000000003a8c895, lost 0 --> line (4) above >> ..... id 0000000000000006, value 0000000000690e88, lost 0 >> ..... id 000000000000000e, value 0000000003a8f569, lost 0 --> line (5) above >> >> Summary: >> - Above command starts the CPU sampling facility, with runs interrupt >> driven when a 4KB page is full. An interrupt processes the 128 samples >> and calls eventually perf_output_read_group() for each sample to save it >> in the event's ring buffer. >> >> - At that time the CPU counting facility is invoked to read the value of >> the event cycles. This value is saved as the second value in the >> sample_read structure. >> >> - The first and odd lines in the perf script output displays the period >> value between 2 samples being created by hardware. It is the number >> of instructions executes before the hardware writes a sample. >> >> - The second and even lines in the perf script output displays the number >> of CPU cycles needed to process each sample and save it in the event's ring >> buffer. .... >> Thanks a lot. > > I agree, it looks like the test in its current form isn't valid and shouldn't be run on s390. > > Does this buffered hardware counter only support the cycles event? Or does it support any event but there is only one instance of it? > > Because if it only supports the cycles event then we can just change the test to count branches which would force it to run on a simple counter instead. If it supports any event then I think the test (or sample read as a whole) can never work. > > You'd still probably want to update the PMU to reject sample read events that run on the buffered counter, whether it's only for cycles or not. That would prevent people from being confused outside of the test as well. > We can disable this leader sampling test for s390, because we only have one event. Or I change the command line invocation to this: # git diff tests/shell/record.sh diff --git a/tools/perf/tests/shell/record.sh b/tools/perf/tests/shell/record.sh index 0f5841c479e7..4d2557b85606 100755 --- a/tools/perf/tests/shell/record.sh +++ b/tools/perf/tests/shell/record.sh @@ -260,7 +260,7 @@ test_uid() { test_leader_sampling() { echo "Basic leader sampling test" - if ! perf record -o "${perfdata}" -e "{cycles,cycles}:Su" -- \ + if ! perf record -o "${perfdata}" -e "{cpum_sf/SF_CYCLES_BASIC/,cycles}:S" -- \ perf test -w brstack 2> /dev/null then echo "Leader sampling [Failed record]" @@ -432,18 +432,18 @@ if [[ $default_fd_limit -lt $min_fd_limit ]]; then ulimit -Sn $min_fd_limit fi It installs the sampling using a different perf_event::attr::type and perf_event::attr::config combination for sampling. It now operates on two sampling event which run in sync. Leads to this result: # for i in $(seq 10); do ./perf test 124; done 124: perf record tests : Ok 124: perf record tests : Ok 124: perf record tests : Ok 124: perf record tests : Ok 124: perf record tests : Ok 124: perf record tests : Ok 124: perf record tests : Ok 124: perf record tests : Ok 124: perf record tests : Ok 124: perf record tests : Ok # Should I submit a patch for that? Thanks -- Thomas Richter, Dept 3303, IBM s390 Linux Development, Boeblingen, Germany -- IBM Deutschland Research & Development GmbH Vorsitzender des Aufsichtsrats: Wolfgang Wendt Geschäftsführung: David Faller Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-28 12:55 ` Thomas Richter 2025-10-28 15:23 ` James Clark @ 2025-10-30 13:52 ` Anubhav Shelat 2025-10-30 14:19 ` James Clark 1 sibling, 1 reply; 15+ messages in thread From: Anubhav Shelat @ 2025-10-30 13:52 UTC (permalink / raw) To: Thomas Richter Cc: James Clark, mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi > On 10/28/25 12:30, James Clark wrote: > > You mentioned on the other thread that it's failing "differently", can you expand on that? I'm wondering why you sent the revert patch then? The issue was that for every (leader, sibling) event pair there was an 18 cycle difference between the sibling and the leader. Initially I thought that this was fixed by the group throttle fixing patch (9734e25fbf5a), but I just got a VM running 6.12.0-145.el10.aarch64 which is up to date with v6.16 and it has that bug. I want to revert the patch since the bug it exists for has been fixed, and as you mentioned it could be silencing other bugs. Since I'm able to reprouce Thomas's issue on an arm machine it seems like this is not a hardware issue inherent to s390x and there's no reason for this patch to be in the kernel anymore. > > As I mentioned before I'm not seeing any issues. Can you share the kernel version that you tested on and your kernel config? And can you share the same outputs that I asked Thomas for below please. For the system experiencing the 18 cycle difference bug: Linux ampere-mtsnow-altramax-03-vm-13.lab.eng.rdu2.redhat.com 6.12.0-145.el10.aarch64 #1 SMP PREEMPT_DYNAMIC Fri Oct 24 12:39:51 UTC 2025 aarch64 GNU/Linux [root@ampere-mtsnow-altramax-03-vm-13 ~]# perf script | head -30 perf 91202 168884.094165: 146 cycles: ffff9e94ba40 _start+0x0 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094165: 164 cycles: ffff9e94ba40 _start+0x0 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094188: 74 cycles: ffffc6fe45649728 [unknown] ([unknown]) perf 91202 168884.094188: 74 cycles: ffffc6fe45649728 [unknown] ([unknown]) perf 91202 168884.094213: 134 cycles: ffff9e94ba40 _start+0x0 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094213: 152 cycles: ffff9e94ba40 _start+0x0 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094233: 136 cycles: ffff9e9473cc _dl_start+0x0 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094233: 154 cycles: ffff9e9473cc _dl_start+0x0 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094252: 293 cycles: ffffc6fe45648e18 [unknown] ([unknown]) perf 91202 168884.094252: 293 cycles: ffffc6fe45648e18 [unknown] ([unknown]) perf 91202 168884.094273: 275 cycles: ffff9e9473fc _dl_start+0x30 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094273: 293 cycles: ffff9e9473fc _dl_start+0x30 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094291: 56 cycles: ffffc6fe45648e18 [unknown] ([unknown]) perf 91202 168884.094291: 56 cycles: ffffc6fe45648e18 [unknown] ([unknown]) perf 91202 168884.094312: 1138 cycles: ffff9e947400 _dl_start+0x34 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094312: 1156 cycles: ffff9e947400 _dl_start+0x34 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094336: 414 cycles: ffff9e947454 _dl_start+0x88 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094336: 432 cycles: ffff9e947454 _dl_start+0x88 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094355: 692 cycles: ffff9e947444 _dl_start+0x78 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094355: 710 cycles: ffff9e947444 _dl_start+0x78 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094379: 1328 cycles: ffffc6fe45648e18 [unknown] ([unknown]) perf 91202 168884.094379: 1328 cycles: ffffc6fe45648e18 [unknown] ([unknown]) perf 91202 168884.094403: 3871 cycles: ffff9e941550 __GI___tunables_init+0xd0 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094403: 3889 cycles: ffff9e941550 __GI___tunables_init+0xd0 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094425: 7787 cycles: ffff9e94153c __GI___tunables_init+0xbc (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094425: 7805 cycles: ffff9e94153c __GI___tunables_init+0xbc (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094464: 17320 cycles: ffff9e94c794 __set_vma_name+0x54 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094464: 17338 cycles: ffff9e94c794 __set_vma_name+0x54 (/usr/lib/ld-linux-aarch64.so.1) perf 91202 168884.094606: 38889 cycles: ffffc6fe45649bcc [unknown] ([unknown]) perf 91202 168884.094606: 38889 cycles: ffffc6fe45649bcc [unknown] ([unknown]) For the first mismatching sample: [root@ampere-mtsnow-altramax-03-vm-13 ~]# perf report -D | grep ffff9e94ba40 -A 8 168884094165520 0x6a0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x2): 91202/91202: 0xffff9e94ba40 period: 1 addr: 0 ... sample_read: .... group nr 2 ..... id 000000000000002c, value 0000000000000092, lost 0 ..... id 0000000000000034, value 00000000000000a4, lost 0 ... thread: perf:91202 ...... dso: /usr/lib/ld-linux-aarch64.so.1 ... thread: perf:91202 ...... dso: /usr/lib/ld-linux-aarch64.so.1 -- 168884094213520 0x770 [0x68]: PERF_RECORD_SAMPLE(IP, 0x2): 91202/91202: 0xffff9e94ba40 period: 1 addr: 0 ... sample_read: .... group nr 2 ..... id 000000000000002c, value 0000000000000162, lost 0 ..... id 0000000000000034, value 0000000000000186, lost 0 ... thread: perf:91202 ...... dso: /usr/lib/ld-linux-aarch64.so.1 ... thread: perf:91202 ...... dso: /usr/lib/ld-linux-aarch64.so.1 For the first matching sample: [root@ampere-mtsnow-altramax-03-vm-13 ~]# perf report -D | grep ffffc6fe45649728 -A 8 168884094188520 0x708 [0x68]: PERF_RECORD_SAMPLE(IP, 0x1): 91202/91202: 0xffffc6fe45649728 period: 1 addr: 0 ... sample_read: .... group nr 2 ..... id 000000000000002c, value 00000000000000dc, lost 0 ..... id 0000000000000034, value 00000000000000ee, lost 0 ... thread: perf:91202 ...... dso: <not found> ... thread: perf:91202 ...... dso: <not found> I also was able to get a bare metal arm machine with the same kernel version: Linux ampere-mtsnow-altra-01.khw.eng.rdu2.dc.redhat.com 6.12.0-145.el10.aarch64 #1 SMP PREEMPT_DYNAMIC Fri Oct 24 12:39:51 UTC 2025 aarch64 GNU/Linux. On this machine I wasn't able to reproduce any issue. I'm trying to get another vm to reproduce Thomas' issue again. Anubhav ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-30 13:52 ` Anubhav Shelat @ 2025-10-30 14:19 ` James Clark 2025-10-30 15:22 ` Anubhav Shelat 0 siblings, 1 reply; 15+ messages in thread From: James Clark @ 2025-10-30 14:19 UTC (permalink / raw) To: Anubhav Shelat, Thomas Richter Cc: mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi On 30/10/2025 1:52 pm, Anubhav Shelat wrote: >> On 10/28/25 12:30, James Clark wrote: >>> You mentioned on the other thread that it's failing "differently", can you expand on that? I'm wondering why you sent the revert patch then? > The issue was that for every (leader, sibling) event pair there was an > 18 cycle difference between the sibling and the leader. Initially I Isn't that the same throttling bug? A difference in cycle counts. > thought that this was fixed by the group throttle fixing patch > (9734e25fbf5a), but I just got a VM running 6.12.0-145.el10.aarch64 > which is up to date with v6.16 and it has that bug. I'm not really sure what "v6.12 up to date with v6.16" means. > I want to revert the patch since the bug it exists for has been fixed, How do you know it's fixed if you are still seeing differences in cycle counts? That seems like not fixed to me. (Although I'm not able to reproduce it). > and as you mentioned it could be silencing other bugs. Since I'm able > to reprouce Thomas's issue on an arm machine it seems like this is not > a hardware issue inherent to s390x and there's no reason for this > patch to be in the kernel anymore. > >>> As I mentioned before I'm not seeing any issues. Can you share the kernel version that you tested on and your kernel config? And can you share the same outputs that I asked Thomas for below please. > For the system experiencing the 18 cycle difference bug: > Linux ampere-mtsnow-altramax-03-vm-13.lab.eng.rdu2.redhat.com > 6.12.0-145.el10.aarch64 #1 SMP PREEMPT_DYNAMIC Fri Oct 24 12:39:51 UTC > 2025 aarch64 GNU/Linux Right, but this is an old kernel. Does the issue reproduce for you on the latest kernel? Can we just ignore old kernels from this point on, I think that's what got confusing in the last thread. Let's simplify it and say it has to be tested on the latest kernel on bare metal. If it has to be in a VM because that's the only place it reproduces then both the host and guest have to be the same and latest version. Otherwise we're just putting effort into some bug that's already been fixed. > [root@ampere-mtsnow-altramax-03-vm-13 ~]# perf script | head -30 > perf 91202 168884.094165: 146 cycles: > ffff9e94ba40 _start+0x0 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094165: 164 cycles: > ffff9e94ba40 _start+0x0 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094188: 74 cycles: > ffffc6fe45649728 [unknown] ([unknown]) > perf 91202 168884.094188: 74 cycles: > ffffc6fe45649728 [unknown] ([unknown]) > perf 91202 168884.094213: 134 cycles: > ffff9e94ba40 _start+0x0 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094213: 152 cycles: > ffff9e94ba40 _start+0x0 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094233: 136 cycles: > ffff9e9473cc _dl_start+0x0 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094233: 154 cycles: > ffff9e9473cc _dl_start+0x0 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094252: 293 cycles: > ffffc6fe45648e18 [unknown] ([unknown]) > perf 91202 168884.094252: 293 cycles: > ffffc6fe45648e18 [unknown] ([unknown]) > perf 91202 168884.094273: 275 cycles: > ffff9e9473fc _dl_start+0x30 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094273: 293 cycles: > ffff9e9473fc _dl_start+0x30 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094291: 56 cycles: > ffffc6fe45648e18 [unknown] ([unknown]) > perf 91202 168884.094291: 56 cycles: > ffffc6fe45648e18 [unknown] ([unknown]) > perf 91202 168884.094312: 1138 cycles: > ffff9e947400 _dl_start+0x34 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094312: 1156 cycles: > ffff9e947400 _dl_start+0x34 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094336: 414 cycles: > ffff9e947454 _dl_start+0x88 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094336: 432 cycles: > ffff9e947454 _dl_start+0x88 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094355: 692 cycles: > ffff9e947444 _dl_start+0x78 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094355: 710 cycles: > ffff9e947444 _dl_start+0x78 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094379: 1328 cycles: > ffffc6fe45648e18 [unknown] ([unknown]) > perf 91202 168884.094379: 1328 cycles: > ffffc6fe45648e18 [unknown] ([unknown]) > perf 91202 168884.094403: 3871 cycles: > ffff9e941550 __GI___tunables_init+0xd0 > (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094403: 3889 cycles: > ffff9e941550 __GI___tunables_init+0xd0 > (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094425: 7787 cycles: > ffff9e94153c __GI___tunables_init+0xbc > (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094425: 7805 cycles: > ffff9e94153c __GI___tunables_init+0xbc > (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094464: 17320 cycles: > ffff9e94c794 __set_vma_name+0x54 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094464: 17338 cycles: > ffff9e94c794 __set_vma_name+0x54 (/usr/lib/ld-linux-aarch64.so.1) > perf 91202 168884.094606: 38889 cycles: > ffffc6fe45649bcc [unknown] ([unknown]) > perf 91202 168884.094606: 38889 cycles: > ffffc6fe45649bcc [unknown] ([unknown]) > > For the first mismatching sample: > [root@ampere-mtsnow-altramax-03-vm-13 ~]# perf report -D | grep > ffff9e94ba40 -A 8 > 168884094165520 0x6a0 [0x68]: PERF_RECORD_SAMPLE(IP, 0x2): > 91202/91202: 0xffff9e94ba40 period: 1 addr: 0 > ... sample_read: > .... group nr 2 > ..... id 000000000000002c, value 0000000000000092, lost 0 > ..... id 0000000000000034, value 00000000000000a4, lost 0 > ... thread: perf:91202 > ...... dso: /usr/lib/ld-linux-aarch64.so.1 > ... thread: perf:91202 > ...... dso: /usr/lib/ld-linux-aarch64.so.1 > -- > 168884094213520 0x770 [0x68]: PERF_RECORD_SAMPLE(IP, 0x2): > 91202/91202: 0xffff9e94ba40 period: 1 addr: 0 > ... sample_read: > .... group nr 2 > ..... id 000000000000002c, value 0000000000000162, lost 0 > ..... id 0000000000000034, value 0000000000000186, lost 0 > ... thread: perf:91202 > ...... dso: /usr/lib/ld-linux-aarch64.so.1 > ... thread: perf:91202 > ...... dso: /usr/lib/ld-linux-aarch64.so.1 > > For the first matching sample: > [root@ampere-mtsnow-altramax-03-vm-13 ~]# perf report -D | grep > ffffc6fe45649728 -A 8 > 168884094188520 0x708 [0x68]: PERF_RECORD_SAMPLE(IP, 0x1): > 91202/91202: 0xffffc6fe45649728 period: 1 addr: 0 > ... sample_read: > .... group nr 2 > ..... id 000000000000002c, value 00000000000000dc, lost 0 > ..... id 0000000000000034, value 00000000000000ee, lost 0 > ... thread: perf:91202 > ...... dso: <not found> > ... thread: perf:91202 > ...... dso: <not found> > > > I also was able to get a bare metal arm machine with the same kernel > version: Linux ampere-mtsnow-altra-01.khw.eng.rdu2.dc.redhat.com > 6.12.0-145.el10.aarch64 #1 SMP PREEMPT_DYNAMIC Fri Oct 24 12:39:51 UTC > 2025 aarch64 GNU/Linux. On this machine I wasn't able to reproduce any > issue. I'm trying to get another vm to reproduce Thomas' issue again. > > Anubhav > ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH] Revert "perf test: Allow tolerance for leader sampling test" 2025-10-30 14:19 ` James Clark @ 2025-10-30 15:22 ` Anubhav Shelat 0 siblings, 0 replies; 15+ messages in thread From: Anubhav Shelat @ 2025-10-30 15:22 UTC (permalink / raw) To: James Clark Cc: Thomas Richter, mpetlan, acme, namhyung, irogers, linux-perf-users, peterz, mingo, mark.rutland, alexander.shishkin, jolsa, adrian.hunter, kan.liang, dapeng1.mi On Thu, Oct 30, 2025 at 2:19 PM James Clark <james.clark@linaro.org> wrote: > Isn't that the same throttling bug? A difference in cycle counts. No, because nearly every sample has the difference. The tolerance setting doesn't matter if every sample is mismatched. I don't think this is the throttling bug. It doesn't match the big description in https://lore.kernel.org/lkml/20250520181644.2673067-2-kan.liang@linux.intel.com/. > I'm not really sure what "v6.12 up to date with v6.16" means. The RHEL kernel has a different naming convention. The -145 refers to the 145th iteration of the RHEL kernel that started from 6.12 so 6.12.0-145 is relatively new, but I'll build from a new kernel and test again. Anubhav ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2025-11-12 10:50 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-10-23 13:24 [PATCH] Revert "perf test: Allow tolerance for leader sampling test" Anubhav Shelat 2025-10-24 8:40 ` Thomas Richter 2025-10-24 17:21 ` Anubhav Shelat 2025-10-27 10:27 ` Thomas Richter 2025-10-28 11:30 ` James Clark 2025-10-28 12:55 ` Thomas Richter 2025-10-28 15:23 ` James Clark 2025-10-29 7:37 ` Thomas Richter 2025-10-29 9:25 ` James Clark 2025-11-11 11:22 ` Thomas Richter 2025-11-11 14:03 ` James Clark 2025-11-12 10:47 ` Thomas Richter 2025-10-30 13:52 ` Anubhav Shelat 2025-10-30 14:19 ` James Clark 2025-10-30 15:22 ` Anubhav Shelat
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).