Re: perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Michael Petlan <mpetlan@redhat.com>
To: linux-perf-users@vger.kernel.org
Cc: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>,
	 Namhyung Kim <namhyung@kernel.org>,
	Arnaldo de Melo <acme@redhat.com>,
	 vmolnaro@redhat.com
Subject: Re: perf test fail :: "perf stat --bpf-counters --for-each-cgroup test"
Date: Fri, 1 Nov 2024 11:15:39 +0100 (CET)	[thread overview]
Message-ID: <536d5b91-f9ed-99b3-6c17-3d93bf451ffd@redhat.com> (raw)
In-Reply-To: <alpine.LRH.2.20.2407191303000.11376@Diego>

[-- Attachment #1: Type: text/plain, Size: 4229 bytes --]

On Fri, 19 Jul 2024, Michael Petlan wrote:
> On Fri, 19 Jul 2024, Arnaldo Carvalho de Melo wrote:
> > On Fri, Jul 19, 2024, 6:50 AM Michael Petlan <mpetlan@redhat.com> wrote:
> >       Hello Namhyung,
> > 
> >       we were investigating some test failures of the testcase mentioned
> >       in $subj. We have narrowed it down to:
> > 
> >           # perf stat -C 0,1 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 1 perf test -w thloop
> > 
> >           Performance counter stats for 'CPU(s) 0,1':
> >                <not counted>      cycles                           system.slice
> >                3,020,401,084      cycles                           user.slice                       
> > 
> >                1.009787097 seconds time elapsed
> > 
> >       As seen, the system.slice is not counted properly in our case. It
> >       happens even without bpf-counters being involved.
> > 
> >       There were rumours that it might be caused due to too small system
> >       load, but it apparently happens even when the load was replaced by
> >       "thloop" workload from perf-test's workload library. However, even
> >       so, if the load was insufficient, we'd see a value – 0 instead of
> >       "not counted". The "<not counted>" result is printed if the counter
> >       wasn't properly enabled and running.
> > 
> >       Have you encountered this problem? What could cause it?
> > 
> > 
> > What does running with -vvv says? Some inconclusive error coming from the kernel? 
> 
Hello!

We have been investigating this issue a bit more again and we have come
to conclusion that everything is probably OK, except of the testcase
which in short relies on the fact that taskset can force any system.slice
workload to happen on a particular CPU, which in my opinion does not
apply, being rather random and that's why the test sometimes fails.

To summarize the problem a bit:

1) The $subj testcase sometimes fails.

2) It consists of two parts, one performs counting system-wide and the
second limits the counting to CPUs 0 and 1. The second one sometimes
fails, while the first (systemwide) passes always.

3) The reason why the test fails is because system.slice may get
<not counted> result.

4) There is another problem with this testcase on single-cpu boxes, since
there is no "cpu 1", so we decided to try having "-C 0" and "taskset -c 0"
on such boxes. The problems with getting "<not counted>" disappeared!

---------------------------

So... The systemwide tracing test works:

# perf stat --for-each-cgroup system.slice,user.slice -e cycles -a -- sleep 3
 Performance counter stats for 'system wide':

     8,884,593      cycles            system.slice
     5,645,624      cycles            user.slice

   3.004137451 seconds time elapsed

When we pin the workload AND tracing to particular CPU, it might fail:

# perf stat -C 0 --for-each-cgroup system.slice,user.slice -e cycles -- taskset -c 0 true

 Performance counter stats for 'CPU(s) 0':

     <not counted>  cycles            system.slice
     2,722,263      cycles            user.slice

   0.004184686 seconds time elapsed

Namhyung said that there might be not enough load, which finally appears
to be the problem. But not in the manner that replacing `true` by something
more "heavy" would help, but in the fact that system.slice didn't run on
CPU 0 at all during the `perf stat` counting.

Taskset can pin the process to some CPU, but even without it (or when we
pin it to CPU 3 for example), _some_ user.slice content is always run on
cpu 0, so we get values.

However, there is no guarantee that system.slice will run there. We would
probably need to load more the content that systemd decides to put under
"system.slice" and hope that it will get a chance to run on CPU 0 or 1 or
whatever we use in the testcase.

Of course, the more CPUs the machine has, the higher chance to get the
<not counted> result for system.slice is. That's why -a works and also
that's why it works on a single-CPU machine.

............

Thus, I think that we should simply remove the taskset part of the testcase
and leave only the systemwide part.

Thoughts?

Michael

next prev parent reply	other threads:[~2024-11-01 10:15 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-19  9:50 perf test fail :: "perf stat --bpf-counters --for-each-cgroup test" Michael Petlan
     [not found] ` <CA+JHD90TkDVHPw4jqxMX2guqsg-8xrqD2iiEfZ_akixvVYZKZg@mail.gmail.com>
2024-07-19 11:05   ` Michael Petlan
2024-11-01 10:15     ` Michael Petlan [this message]
2024-11-04 19:52       ` Namhyung Kim
2024-07-20  0:30 ` Namhyung Kim
2024-07-23  9:36   ` Michael Petlan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=536d5b91-f9ed-99b3-6c17-3d93bf451ffd@redhat.com \
    --to=mpetlan@redhat.com \
    --cc=acme@redhat.com \
    --cc=arnaldo.melo@gmail.com \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=namhyung@kernel.org \
    --cc=vmolnaro@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).