linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Michael Petlan <mpetlan@redhat.com>
To: Namhyung Kim <namhyung@kernel.org>
Cc: vmolnaro@redhat.com, linux-perf-users@vger.kernel.org,
	acme@kernel.org,  acme@redhat.com
Subject: Re: [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs
Date: Thu, 13 Jun 2024 16:20:58 +0200 (CEST)	[thread overview]
Message-ID: <alpine.LRH.2.20.2406131407370.4040@Diego> (raw)
In-Reply-To: <Zl-x8uob6NT3HlfT@google.com>

On Tue, 4 Jun 2024, Namhyung Kim wrote:
> On Tue, Jun 04, 2024 at 05:31:11PM +0200, vmolnaro@redhat.com wrote:
> > From: Veronika Molnarova <vmolnaro@redhat.com>
> > 
> > The test has been failing for some time when two separate runs of
> > perf benchmarks are recorded and the counts of the samples are compared,
> > while once the recording was done with option --bpf-counters and once
> > without it. It is expected that the count of the samples should within
> > a certain range, firstly the difference should have been within 10%,
> > which was then later raised to 20%. However, the test case keeps failing
> > on certain architectures as recording the same benchmark can provide
> > completely different counts samples based on the current load of the
> > system.
> > 
> > Sampling two separate runs on intel-eaglestream-spr-13 of "perf stat
> > --no-big-num -e cycles -- perf bench sched messaging -g 1 -l 100 -t":
> > 
> >  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
> > 
> >          396782898      cycles
> > 
> >        0.010051983 seconds time elapsed
> > 
> >        0.008664000 seconds user
> >        0.097058000 seconds sys
> > 
> >  Performance counter stats for 'perf bench sched messaging -g 1 -l 100 -t':
> > 
> >         1431133032      cycles
> > 
> >        0.021803714 seconds time elapsed
> > 
> >        0.023377000 seconds user
> >        0.349918000 seconds sys
> > 
> > , which is ranging from 400mil to 1400mil samples.
> > 
> > From the testing point of view, it does not make sense to compare two
> > separate runs against each other when the conditions may change
> > significantly. Remove the comparison of two separate runs and check only
> > whether the stating works as expected for the --bpf-counters option. Compare
> > the samples count only when the samples are recorded simultaneously
> > ensuring the same conditions.
> 
> Hmm.. but having a test which checks if the output is sane can be
> useful.  If it's a problem of dynamic changes in cpu cycles, maybe
> we can use 'instructions' event instead (probably with :u) to get
> more stable values?

Hello.

As far as I understand it, nowadays, the test checks two things:

  test_bpf_counters()
    record $workload twice (with and without --bpf-counters)
	check that there are numeric results
	compare the results

  test_bpf_modifier()
    record $workload once with and without modifier (which should be what
	--bpf-counters switch does to the events, right?)
	check that there are numeric results
	compare the results
	
The problem here is not only the "dynamic changes in cpu-cycles", it
is rather in the testcase design itself. A testcase that compares two
metrics should get rid of all possible variable effects that influence it.

The second function actually compares the values correctly, since they
are measured against the same identical workload.

So, in my opinion, a better test design would be:

  (1) check that record without --bpf-counters works as a reference run
  (2) check that record with --bpf-counters works too
      (if not, we may compare to (1) to find out if whole `record` is broken
	  or just the --bpf-counters option)
  (3) possibly run `perf evlist` to check that --bpf-counters has added the
      '/b' modifier
  (4) check with versus without "b", such as current test_bpf_modifier()
      function does

I like what Veronika suggests, it is basically the above, except of (3).

...

In case we want preserve two separate runs in test_bpf_counters() _and_
also check the numbers, then we should:
  - use some more predictable workload:
    - in an ideal case a statically linked simple binary
	- in less-than-ideal case `perf test -w something`
  - use instructions instead of cycles
However, I don't like that idea very much, because of the design principles
mentioned above.

Regards,
Michael

> 
> Thanks,
> Namhyung
> 
> > 
> > Signed-off-by: Veronika Molnarova <vmolnaro@redhat.com>
> > ---
> >  tools/perf/tests/shell/stat_bpf_counters.sh | 13 ++++++-------
> >  1 file changed, 6 insertions(+), 7 deletions(-)
> > 
> > diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh
> > index 61f8149d854e..873b576836c6 100755
> > --- a/tools/perf/tests/shell/stat_bpf_counters.sh
> > +++ b/tools/perf/tests/shell/stat_bpf_counters.sh
> > @@ -6,19 +6,19 @@ set -e
> >  
> >  workload="perf bench sched messaging -g 1 -l 100 -t"
> >  
> > -# check whether $2 is within +/- 20% of $1
> > +# check whether $2 is within +/- 10% of $1
> >  compare_number()
> >  {
> >  	first_num=$1
> >  	second_num=$2
> >  
> > -	# upper bound is first_num * 120%
> > -	upper=$(expr $first_num + $first_num / 5 )
> > -	# lower bound is first_num * 80%
> > -	lower=$(expr $first_num - $first_num / 5 )
> > +	# upper bound is first_num * 110%
> > +	upper=$(expr $first_num + $first_num / 10 )
> > +	# lower bound is first_num * 90%
> > +	lower=$(expr $first_num - $first_num / 10 )
> >  
> >  	if [ $second_num -gt $upper ] || [ $second_num -lt $lower ]; then
> > -		echo "The difference between $first_num and $second_num are greater than 20%."
> > +		echo "The difference between $first_num and $second_num are greater than 10%."
> >  		exit 1
> >  	fi
> >  }
> > @@ -44,7 +44,6 @@ test_bpf_counters()
> >  	base_cycles=$(perf stat --no-big-num -e cycles -- $workload 2>&1 | awk '/cycles/ {print $1}')
> >  	bpf_cycles=$(perf stat --no-big-num --bpf-counters -e cycles -- $workload  2>&1 | awk '/cycles/ {print $1}')
> >  	check_counts $base_cycles $bpf_cycles
> > -	compare_number $base_cycles $bpf_cycles
> >  	echo "[Success]"
> >  }
> >  
> > -- 
> > 2.43.0
> > 
> 
> 


  parent reply	other threads:[~2024-06-13 14:21 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-04 15:31 [PATCH] perf test stat_bpf_counter.sh: Remove comparison of separate runs vmolnaro
2024-06-05  0:31 ` Namhyung Kim
2024-06-06 13:09   ` Veronika Molnarova
2024-06-07 18:38     ` Namhyung Kim
2024-06-13 14:20   ` Michael Petlan [this message]
2024-06-16  3:56     ` Namhyung Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=alpine.LRH.2.20.2406131407370.4040@Diego \
    --to=mpetlan@redhat.com \
    --cc=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=namhyung@kernel.org \
    --cc=vmolnaro@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).