[PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs
@ 2024-06-04 15:31 vmolnaro
  2024-06-05  0:31 ` Namhyung Kim
  0 siblings, 1 reply; 6+ messages in thread
From: vmolnaro @ 2024-06-04 15:31 UTC (permalink / raw)
  To: linux-perf-users, acme, acme; +Cc: mpetlan

From: Veronika Molnarova <vmolnaro@redhat.com>

The test has been failing for some time when two separate runs of
perf benchmarks are recorded and the counts of the samples are compared,
while once the recording was done with option --bpf-counters and once
without it. It is expected that the count of the samples should within
a certain range, firstly the difference should have been within 10%,
which was then later raised to 20%. However, the test case keeps failing
on certain architectures as recording the same benchmark can provide
completely different counts samples based on the current load of the
system.

Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
--no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":

 Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':

         396782898      cycles

       0.010051983 seconds time elapsed

       0.008664000 seconds user
       0.097058000 seconds sys

 Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':

        1431133032      cycles

       0.021803714 seconds time elapsed

       0.023377000 seconds user
       0.349918000 seconds sys

, which is ranging from 400mil to 1400mil samples.

From the testing point of view, it does not make sense to compare two
separate runs against each other when the conditions may change
significantly. Remove the comparison of two separate runs and check only
whether the stating works as expected for the --bpf-counters option. Compare
the samples count only when the samples are recorded simultaneously
ensuring the same conditions.

Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
---
 tools/perf/tests/shell/stat_bpf_counters.sh | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh
index 61f8149d854e..873b576836c6 100755
--- a/tools/perf/tests/shell/stat_bpf_counters.sh
+++ b/tools/perf/tests/shell/stat_bpf_counters.sh
@@ -6,19 +6,19 @@ set -e
 
 workload="perf bench sched messaging -g 1 -l 100 -t"
 
-# check whether $2 is within +/- 20% of $1
+# check whether $2 is within +/- 10% of $1
 compare_number()
 {
 	first_num=$1
 	second_num=$2
 
-	# upper bound is first_num * 120%
-	upper=$(expr $first_num + $first_num / 5 )
-	# lower bound is first_num * 80%
-	lower=$(expr $first_num - $first_num / 5 )
+	# upper bound is first_num * 110%
+	upper=$(expr $first_num + $first_num / 10 )
+	# lower bound is first_num * 90%
+	lower=$(expr $first_num - $first_num / 10 )
 
 	if [ $second_num -gt $upper ] || [ $second_num -lt $lower ]; then
-		echo "The difference between $first_num and $second_num are greater than 20%."
+		echo "The difference between $first_num and $second_num are greater than 10%."
 		exit 1
 	fi
 }
@@ -44,7 +44,6 @@ test_bpf_counters()
 	base_cycles=$(perf stat --no-big-num -e cycles -- $workload 2>&1 | awk '/cycles/ {print $1}')
 	bpf_cycles=$(perf stat --no-big-num --bpf-counters -e cycles -- $workload  2>&1 | awk '/cycles/ {print $1}')
 	check_counts $base_cycles $bpf_cycles
-	compare_number $base_cycles $bpf_cycles
 	echo "[Success]"
 }
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs
  2024-06-04 15:31 [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs vmolnaro
@ 2024-06-05  0:31 ` Namhyung Kim
  2024-06-06 13:09   ` Veronika Molnarova
  2024-06-13 14:20   ` Michael Petlan
  0 siblings, 2 replies; 6+ messages in thread
From: Namhyung Kim @ 2024-06-05  0:31 UTC (permalink / raw)
  To: vmolnaro; +Cc: linux-perf-users, acme, acme, mpetlan

On Tue, Jun 04, 2024 at 05:31:11PM +0200, vmolnaro@redhat.com wrote:
> From: Veronika Molnarova <vmolnaro@redhat.com>
> 
> The test has been failing for some time when two separate runs of
> perf benchmarks are recorded and the counts of the samples are compared,
> while once the recording was done with option --bpf-counters and once
> without it. It is expected that the count of the samples should within
> a certain range, firstly the difference should have been within 10%,
> which was then later raised to 20%. However, the test case keeps failing
> on certain architectures as recording the same benchmark can provide
> completely different counts samples based on the current load of the
> system.
> 
> Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
> --no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":
> 
>  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
> 
>          396782898      cycles
> 
>        0.010051983 seconds time elapsed
> 
>        0.008664000 seconds user
>        0.097058000 seconds sys
> 
>  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
> 
>         1431133032      cycles
> 
>        0.021803714 seconds time elapsed
> 
>        0.023377000 seconds user
>        0.349918000 seconds sys
> 
> , which is ranging from 400mil to 1400mil samples.
> 
> From the testing point of view, it does not make sense to compare two
> separate runs against each other when the conditions may change
> significantly. Remove the comparison of two separate runs and check only
> whether the stating works as expected for the --bpf-counters option. Compare
> the samples count only when the samples are recorded simultaneously
> ensuring the same conditions.

Hmm.. but having a test which checks if the output is sane can be
useful.  If it's a problem of dynamic changes in cpu cycles, maybe
we can use 'instructions' event instead (probably with :u) to get
more stable values?

Thanks,
Namhyung

> 
> Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
> ---
>  tools/perf/tests/shell/stat_bpf_counters.sh | 13 ++++++-------
>  1 file changed, 6 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh
> index 61f8149d854e..873b576836c6 100755
> --- a/tools/perf/tests/shell/stat_bpf_counters.sh
> +++ b/tools/perf/tests/shell/stat_bpf_counters.sh
> @@ -6,19 +6,19 @@ set -e
>  
>  workload="perf bench sched messaging -g 1 -l 100 -t"
>  
> -# check whether $2 is within +/- 20% of $1
> +# check whether $2 is within +/- 10% of $1
>  compare_number()
>  {
>  	first_num=$1
>  	second_num=$2
>  
> -	# upper bound is first_num * 120%
> -	upper=$(expr $first_num + $first_num / 5 )
> -	# lower bound is first_num * 80%
> -	lower=$(expr $first_num - $first_num / 5 )
> +	# upper bound is first_num * 110%
> +	upper=$(expr $first_num + $first_num / 10 )
> +	# lower bound is first_num * 90%
> +	lower=$(expr $first_num - $first_num / 10 )
>  
>  	if [ $second_num -gt $upper ] || [ $second_num -lt $lower ]; then
> -		echo "The difference between $first_num and $second_num are greater than 20%."
> +		echo "The difference between $first_num and $second_num are greater than 10%."
>  		exit 1
>  	fi
>  }
> @@ -44,7 +44,6 @@ test_bpf_counters()
>  	base_cycles=$(perf stat --no-big-num -e cycles -- $workload 2>&1 | awk '/cycles/ {print $1}')
>  	bpf_cycles=$(perf stat --no-big-num --bpf-counters -e cycles -- $workload  2>&1 | awk '/cycles/ {print $1}')
>  	check_counts $base_cycles $bpf_cycles
> -	compare_number $base_cycles $bpf_cycles
>  	echo "[Success]"
>  }
>  
> -- 
> 2.43.0
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs
  2024-06-05  0:31 ` Namhyung Kim
@ 2024-06-06 13:09   ` Veronika Molnarova
  2024-06-07 18:38     ` Namhyung Kim
  2024-06-13 14:20   ` Michael Petlan
  1 sibling, 1 reply; 6+ messages in thread
From: Veronika Molnarova @ 2024-06-06 13:09 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: linux-perf-users, acme, acme, mpetlan



On 6/5/24 02:31, Namhyung Kim wrote:
> On Tue, Jun 04, 2024 at 05:31:11PM +0200, vmolnaro@redhat.com wrote:
>> From: Veronika Molnarova <vmolnaro@redhat.com>
>>
>> The test has been failing for some time when two separate runs of
>> perf benchmarks are recorded and the counts of the samples are compared,
>> while once the recording was done with option --bpf-counters and once
>> without it. It is expected that the count of the samples should within
>> a certain range, firstly the difference should have been within 10%,
>> which was then later raised to 20%. However, the test case keeps failing
>> on certain architectures as recording the same benchmark can provide
>> completely different counts samples based on the current load of the
>> system.
>>
>> Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
>> --no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":
>>
>>  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
>>
>>          396782898      cycles
>>
>>        0.010051983 seconds time elapsed
>>
>>        0.008664000 seconds user
>>        0.097058000 seconds sys
>>
>>  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
>>
>>         1431133032      cycles
>>
>>        0.021803714 seconds time elapsed
>>
>>        0.023377000 seconds user
>>        0.349918000 seconds sys
>>
>> , which is ranging from 400mil to 1400mil samples.
>>
>> From the testing point of view, it does not make sense to compare two
>> separate runs against each other when the conditions may change
>> significantly. Remove the comparison of two separate runs and check only
>> whether the stating works as expected for the --bpf-counters option. Compare
>> the samples count only when the samples are recorded simultaneously
>> ensuring the same conditions.
> 
> Hmm.. but having a test which checks if the output is sane can be
> useful.  If it's a problem of dynamic changes in cpu cycles, maybe
> we can use 'instructions' event instead (probably with :u) to get
> more stable values?
> 
> Thanks,
> Namhyung
> 

Well but isn't it better to check the sanity of the output if the counters are recorded
for a load with the same conditions as has been added in patch d9bd1d4
"perf test bpf-counters: Add test for BPF event modifier" utilizing the event modifiers: 
perf stat --no-big-num -e cycles/name=base_cycles/,cycles/name=bpf_cycles/b -- $workload?
Comparing two separate runs is more based on the stability of the workload than the
bpf-counters option, which then causes the issue of defining the threshold of what values
are still within the sane range.

But you are right that another possible fix for this issue is to change the workload or the
event to get some more stable values, which could be comparable.

Thanks,
Veronika

>>
>> Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
>> ---
>>  tools/perf/tests/shell/stat_bpf_counters.sh | 13 ++++++-------
>>  1 file changed, 6 insertions(+), 7 deletions(-)
>>
>> diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh
>> index 61f8149d854e..873b576836c6 100755
>> --- a/tools/perf/tests/shell/stat_bpf_counters.sh
>> +++ b/tools/perf/tests/shell/stat_bpf_counters.sh
>> @@ -6,19 +6,19 @@ set -e
>>  
>>  workload="perf bench sched messaging -g 1 -l 100 -t"
>>  
>> -# check whether $2 is within +/- 20% of $1
>> +# check whether $2 is within +/- 10% of $1
>>  compare_number()
>>  {
>>  	first_num=$1
>>  	second_num=$2
>>  
>> -	# upper bound is first_num * 120%
>> -	upper=$(expr $first_num + $first_num / 5 )
>> -	# lower bound is first_num * 80%
>> -	lower=$(expr $first_num - $first_num / 5 )
>> +	# upper bound is first_num * 110%
>> +	upper=$(expr $first_num + $first_num / 10 )
>> +	# lower bound is first_num * 90%
>> +	lower=$(expr $first_num - $first_num / 10 )
>>  
>>  	if [ $second_num -gt $upper ] || [ $second_num -lt $lower ]; then
>> -		echo "The difference between $first_num and $second_num are greater than 20%."
>> +		echo "The difference between $first_num and $second_num are greater than 10%."
>>  		exit 1
>>  	fi
>>  }
>> @@ -44,7 +44,6 @@ test_bpf_counters()
>>  	base_cycles=$(perf stat --no-big-num -e cycles -- $workload 2>&1 | awk '/cycles/ {print $1}')
>>  	bpf_cycles=$(perf stat --no-big-num --bpf-counters -e cycles -- $workload  2>&1 | awk '/cycles/ {print $1}')
>>  	check_counts $base_cycles $bpf_cycles
>> -	compare_number $base_cycles $bpf_cycles
>>  	echo "[Success]"
>>  }
>>  
>> -- 
>> 2.43.0
>>
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs
  2024-06-06 13:09   ` Veronika Molnarova
@ 2024-06-07 18:38     ` Namhyung Kim
  0 siblings, 0 replies; 6+ messages in thread
From: Namhyung Kim @ 2024-06-07 18:38 UTC (permalink / raw)
  To: Veronika Molnarova; +Cc: linux-perf-users, acme, acme, mpetlan

On Thu, Jun 06, 2024 at 03:09:43PM +0200, Veronika Molnarova wrote:
> 
> 
> On 6/5/24 02:31, Namhyung Kim wrote:
> > On Tue, Jun 04, 2024 at 05:31:11PM +0200, vmolnaro@redhat.com wrote:
> >> From: Veronika Molnarova <vmolnaro@redhat.com>
> >>
> >> The test has been failing for some time when two separate runs of
> >> perf benchmarks are recorded and the counts of the samples are compared,
> >> while once the recording was done with option --bpf-counters and once
> >> without it. It is expected that the count of the samples should within
> >> a certain range, firstly the difference should have been within 10%,
> >> which was then later raised to 20%. However, the test case keeps failing
> >> on certain architectures as recording the same benchmark can provide
> >> completely different counts samples based on the current load of the
> >> system.
> >>
> >> Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
> >> --no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":
> >>
> >>  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
> >>
> >>          396782898      cycles
> >>
> >>        0.010051983 seconds time elapsed
> >>
> >>        0.008664000 seconds user
> >>        0.097058000 seconds sys
> >>
> >>  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
> >>
> >>         1431133032      cycles
> >>
> >>        0.021803714 seconds time elapsed
> >>
> >>        0.023377000 seconds user
> >>        0.349918000 seconds sys
> >>
> >> , which is ranging from 400mil to 1400mil samples.
> >>
> >> From the testing point of view, it does not make sense to compare two
> >> separate runs against each other when the conditions may change
> >> significantly. Remove the comparison of two separate runs and check only
> >> whether the stating works as expected for the --bpf-counters option. Compare
> >> the samples count only when the samples are recorded simultaneously
> >> ensuring the same conditions.
> > 
> > Hmm.. but having a test which checks if the output is sane can be
> > useful.  If it's a problem of dynamic changes in cpu cycles, maybe
> > we can use 'instructions' event instead (probably with :u) to get
> > more stable values?
> > 
> > Thanks,
> > Namhyung
> > 
> 
> Well but isn't it better to check the sanity of the output if the counters are recorded
> for a load with the same conditions as has been added in patch d9bd1d4
> "perf test bpf-counters: Add test for BPF event modifier" utilizing the event modifiers: 
> perf stat --no-big-num -e cycles/name=base_cycles/,cycles/name=bpf_cycles/b -- $workload?
> Comparing two separate runs is more based on the stability of the workload than the
> bpf-counters option, which then causes the issue of defining the threshold of what values
> are still within the sane range.

Right, it'd be better if we can run once and the compare the counters.
But I don't think if --bpf-counters works only for a specific event.

> 
> But you are right that another possible fix for this issue is to change the workload or the
> event to get some more stable values, which could be comparable.

Yep, let's try with 'instructions:u'.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs
  2024-06-05  0:31 ` Namhyung Kim
  2024-06-06 13:09   ` Veronika Molnarova
@ 2024-06-13 14:20   ` Michael Petlan
  2024-06-16  3:56     ` Namhyung Kim
  1 sibling, 1 reply; 6+ messages in thread
From: Michael Petlan @ 2024-06-13 14:20 UTC (permalink / raw)
  To: Namhyung Kim; +Cc: vmolnaro, linux-perf-users, acme, acme

On Tue, 4 Jun 2024, Namhyung Kim wrote:
> On Tue, Jun 04, 2024 at 05:31:11PM +0200, vmolnaro@redhat.com wrote:
> > From: Veronika Molnarova <vmolnaro@redhat.com>
> > 
> > The test has been failing for some time when two separate runs of
> > perf benchmarks are recorded and the counts of the samples are compared,
> > while once the recording was done with option --bpf-counters and once
> > without it. It is expected that the count of the samples should within
> > a certain range, firstly the difference should have been within 10%,
> > which was then later raised to 20%. However, the test case keeps failing
> > on certain architectures as recording the same benchmark can provide
> > completely different counts samples based on the current load of the
> > system.
> > 
> > Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
> > --no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":
> > 
> >  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
> > 
> >          396782898      cycles
> > 
> >        0.010051983 seconds time elapsed
> > 
> >        0.008664000 seconds user
> >        0.097058000 seconds sys
> > 
> >  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
> > 
> >         1431133032      cycles
> > 
> >        0.021803714 seconds time elapsed
> > 
> >        0.023377000 seconds user
> >        0.349918000 seconds sys
> > 
> > , which is ranging from 400mil to 1400mil samples.
> > 
> > From the testing point of view, it does not make sense to compare two
> > separate runs against each other when the conditions may change
> > significantly. Remove the comparison of two separate runs and check only
> > whether the stating works as expected for the --bpf-counters option. Compare
> > the samples count only when the samples are recorded simultaneously
> > ensuring the same conditions.
> 
> Hmm.. but having a test which checks if the output is sane can be
> useful.  If it's a problem of dynamic changes in cpu cycles, maybe
> we can use 'instructions' event instead (probably with :u) to get
> more stable values?

Hello.

As far as I understand it, nowadays, the test checks two things:

  test_bpf_counters()
    record $workload twice (with and without --bpf-counters)
	check that there are numeric results
	compare the results

  test_bpf_modifier()
    record $workload once with and without modifier (which should be what
	--bpf-counters switch does to the events, right?)
	check that there are numeric results
	compare the results
	
The problem here is not only the "dynamic changes in cpu-cycles", it
is rather in the testcase design itself. A testcase that compares two
metrics should get rid of all possible variable effects that influence it.

The second function actually compares the values correctly, since they
are measured against the same identical workload.

So, in my opinion, a better test design would be:

  (1) check that record without --bpf-counters works as a reference run
  (2) check that record with --bpf-counters works too
      (if not, we may compare to (1) to find out if whole `record` is broken
	  or just the --bpf-counters option)
  (3) possibly run `perf evlist` to check that --bpf-counters has added the
      '/b' modifier
  (4) check with versus without "b", such as current test_bpf_modifier()
      function does

I like what Veronika suggests, it is basically the above, except of (3).

...

In case we want preserve two separate runs in test_bpf_counters() _and_
also check the numbers, then we should:
  - use some more predictable workload:
    - in an ideal case a statically linked simple binary
	- in less-than-ideal case `perf test -w something`
  - use instructions instead of cycles
However, I don't like that idea very much, because of the design principles
mentioned above.

Regards,
Michael

> 
> Thanks,
> Namhyung
> 
> > 
> > Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
> > ---
> >  tools/perf/tests/shell/stat_bpf_counters.sh | 13 ++++++-------
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh
> > index 61f8149d854e..873b576836c6 100755
> > --- a/tools/perf/tests/shell/stat_bpf_counters.sh
> > +++ b/tools/perf/tests/shell/stat_bpf_counters.sh
> > @@ -6,19 +6,19 @@ set -e
> >  
> >  workload="perf bench sched messaging -g 1 -l 100 -t"
> >  
> > -# check whether $2 is within +/- 20% of $1
> > +# check whether $2 is within +/- 10% of $1
> >  compare_number()
> >  {
> >  	first_num=$1
> >  	second_num=$2
> >  
> > -	# upper bound is first_num * 120%
> > -	upper=$(expr $first_num + $first_num / 5 )
> > -	# lower bound is first_num * 80%
> > -	lower=$(expr $first_num - $first_num / 5 )
> > +	# upper bound is first_num * 110%
> > +	upper=$(expr $first_num + $first_num / 10 )
> > +	# lower bound is first_num * 90%
> > +	lower=$(expr $first_num - $first_num / 10 )
> >  
> >  	if [ $second_num -gt $upper ] || [ $second_num -lt $lower ]; then
> > -		echo "The difference between $first_num and $second_num are greater than 20%."
> > +		echo "The difference between $first_num and $second_num are greater than 10%."
> >  		exit 1
> >  	fi
> >  }
> > @@ -44,7 +44,6 @@ test_bpf_counters()
> >  	base_cycles=$(perf stat --no-big-num -e cycles -- $workload 2>&1 | awk '/cycles/ {print $1}')
> >  	bpf_cycles=$(perf stat --no-big-num --bpf-counters -e cycles -- $workload  2>&1 | awk '/cycles/ {print $1}')
> >  	check_counts $base_cycles $bpf_cycles
> > -	compare_number $base_cycles $bpf_cycles
> >  	echo "[Success]"
> >  }
> >  
> > -- 
> > 2.43.0
> > 
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs
  2024-06-13 14:20   ` Michael Petlan
@ 2024-06-16  3:56     ` Namhyung Kim
  0 siblings, 0 replies; 6+ messages in thread
From: Namhyung Kim @ 2024-06-16  3:56 UTC (permalink / raw)
  To: Michael Petlan; +Cc: vmolnaro, linux-perf-users, acme, acme

Hello,

On Thu, Jun 13, 2024 at 04:20:58PM +0200, Michael Petlan wrote:
> On Tue, 4 Jun 2024, Namhyung Kim wrote:
> > On Tue, Jun 04, 2024 at 05:31:11PM +0200, vmolnaro@redhat.com wrote:
> > > From: Veronika Molnarova <vmolnaro@redhat.com>
> > > 
> > > The test has been failing for some time when two separate runs of
> > > perf benchmarks are recorded and the counts of the samples are compared,
> > > while once the recording was done with option --bpf-counters and once
> > > without it. It is expected that the count of the samples should within
> > > a certain range, firstly the difference should have been within 10%,
> > > which was then later raised to 20%. However, the test case keeps failing
> > > on certain architectures as recording the same benchmark can provide
> > > completely different counts samples based on the current load of the
> > > system.
> > > 
> > > Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
> > > --no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":
> > > 
> > >  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
> > > 
> > >          396782898      cycles
> > > 
> > >        0.010051983 seconds time elapsed
> > > 
> > >        0.008664000 seconds user
> > >        0.097058000 seconds sys
> > > 
> > >  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
> > > 
> > >         1431133032      cycles
> > > 
> > >        0.021803714 seconds time elapsed
> > > 
> > >        0.023377000 seconds user
> > >        0.349918000 seconds sys
> > > 
> > > , which is ranging from 400mil to 1400mil samples.
> > > 
> > > From the testing point of view, it does not make sense to compare two
> > > separate runs against each other when the conditions may change
> > > significantly. Remove the comparison of two separate runs and check only
> > > whether the stating works as expected for the --bpf-counters option. Compare
> > > the samples count only when the samples are recorded simultaneously
> > > ensuring the same conditions.
> > 
> > Hmm.. but having a test which checks if the output is sane can be
> > useful.  If it's a problem of dynamic changes in cpu cycles, maybe
> > we can use 'instructions' event instead (probably with :u) to get
> > more stable values?
> 
> Hello.
> 
> As far as I understand it, nowadays, the test checks two things:
> 
>   test_bpf_counters()
>     record $workload twice (with and without --bpf-counters)
> 	check that there are numeric results
> 	compare the results
> 
>   test_bpf_modifier()
>     record $workload once with and without modifier (which should be what
> 	--bpf-counters switch does to the events, right?)
> 	check that there are numeric results
> 	compare the results
> 	
> The problem here is not only the "dynamic changes in cpu-cycles", it
> is rather in the testcase design itself. A testcase that compares two
> metrics should get rid of all possible variable effects that influence it.
> 
> The second function actually compares the values correctly, since they
> are measured against the same identical workload.
> 
> So, in my opinion, a better test design would be:
> 
>   (1) check that record without --bpf-counters works as a reference run

Do you mean by `perf stat record`?  Or saving the output of `perf stat`
in a text file?


>   (2) check that record with --bpf-counters works too
>       (if not, we may compare to (1) to find out if whole `record` is broken
> 	  or just the --bpf-counters option)

Ditto.


>   (3) possibly run `perf evlist` to check that --bpf-counters has added the
>       '/b' modifier

The `perf evlist` would work only for `perf stat record`.  Then I think
it's a completely new test case.


>   (4) check with versus without "b", such as current test_bpf_modifier()
>       function does
> 
> I like what Veronika suggests, it is basically the above, except of (3).
> 
> ...
> 
> In case we want preserve two separate runs in test_bpf_counters() _and_
> also check the numbers, then we should:
>   - use some more predictable workload:
>     - in an ideal case a statically linked simple binary
> 	- in less-than-ideal case `perf test -w something`
>   - use instructions instead of cycles
> However, I don't like that idea very much, because of the design principles
> mentioned above.

I understand it's not ideal.. maybe it's fine not to check the numbers
here because we do that in the test_bpf_modifier.  But it'd be nice to
have that.  Let's try with instructions event first, change it later if
it doesn't go well.

Thanks,
Namhyung


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-16  3:56 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-04 15:31 [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs vmolnaro
2024-06-05  0:31 ` Namhyung Kim
2024-06-06 13:09   ` Veronika Molnarova
2024-06-07 18:38     ` Namhyung Kim
2024-06-13 14:20   ` Michael Petlan
2024-06-16  3:56     ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).