Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]

Linux Perf Users
 help / color / mirror / Atom feed

From: Milian Wolff <milian.wolff@kdab.com>
To: Andi Kleen <andi@firstfloor.org>
Cc: linux-perf-users@vger.kernel.org,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	linux-kernel@vger.kernel.org, peterz@infradead.org
Subject: Re: broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples]
Date: Tue, 05 Sep 2017 14:26:38 +0200	[thread overview]
Message-ID: <2027973.spD7XaVQdP@milian-kdab2> (raw)
In-Reply-To: <20170905034058.GV2482@two.firstfloor.org>

[-- Attachment #1: Type: text/plain, Size: 3385 bytes --]

On Tuesday, September 5, 2017 5:40:58 AM CEST Andi Kleen wrote:
> > The cycle value gets associated with a sample via it's period value, which
> > is used by `perf report` in the analysis. If I get a single "broken"
> > sample with
>
> I always thought it just used the number of samples?

No, that is not the case. It uses the cycle period as "weight" by default. 
Note also the recent patch set for `perf annotate` by Taeung Song which adds 
the functionality to switch between sample or period fractions. I'm actually 
not sure whether that exists for `perf report` yet.

In some situations where the cycle period values associated with the samples 
are correct, I actually also saw how this is useful: I have seen perf data 
files, where tons of samples got recorded around syscall entry/exit, but most 
samples only had tiny cycle values associated with them. If one only looks at 
number of samples, then it would look like the syscalls are expensive. While 
in reality, way more cycles are executed in userspace. This issue was/is 
apparent when looking at `perf report` values vs. the FlameGraph visualization 
created by the normal `stackcollapse-perf.pl` script. The latter does only 
look at the number of samples, the former takes the sample period value. 
Hotspot also used the former approach, but then I adapted perf's behavior.

> > a cycle count of, say 1E14 and then a million other samples, each with
> > "sane" cycle counts of let's say 1E5, then the one broken sample will
> > hold 50% of the total amount of measured cycles. So perf report will show
> > that the function where the broken sample points to will have a cost of
> > 50%.
> 
> I don't think I've seen such a situation. Did you?

I have seen this situation. This is what this current revival of this thread 
is all about. Without such issues, I wouldn't claim it's such a serious issue.

> BTW I'm not arguing against fixing it, but typically I just
> recommend to avoid frequency mode. The fast sampling at the beginning
> has caused a range of low level PMU bugs and it is hard to reason about
> because of its complex behavior. Also it has no protection against
> synchronizing with repeated patterns in the execution, which
> can cause bad shadowing effects.  If you use the Intel
> event aliases they have all sensible periods set by default.

I think we both agree that the frequency mode as-is should not be the default. 
But it is, and this is a serious issue in my opinion. We will need to find a 
sensible default for the event period and use that mode by default. I nowadays 
always add something like `-c 3000000` which gives me roughly 1k samples per 
second on a ~3GHz machine. It's just a ballpark figure and of course gets 
influenced by frequency scaling, but it's good enough for me. We could use a 
similar approach to find a period based on the max CPU clock rate 
automatically. But of course that would only work for cycles, and not for 
instructions or any of the other fancy event counters. But since the frequency 
mode is probably similarly broken there, it should not be the default. Better 
ask the user for explicit values rather than doing something automatically 
which can lead to broken results.

Cheers

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH&Co KG, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3826 bytes --]

     prev parent reply	other threads:[~2017-09-05 12:26 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-06-17 19:07 deducing CPU clock rate over time from cycle samples Milian Wolff
2017-06-18  4:22 ` Andi Kleen
2017-06-18 19:53   ` Milian Wolff
2017-08-28 14:08     ` broken cycle counts from perf record in frequency mode [Was: Re: deducing CPU clock rate over time from cycle samples] Milian Wolff
2017-08-28 14:40       ` Milian Wolff
2017-08-28 17:28         ` Andi Kleen
2017-09-01 10:34           ` Milian Wolff
2017-09-01 16:48             ` Andi Kleen
2017-09-04 14:35               ` Milian Wolff
2017-09-05  3:40                 ` Andi Kleen
2017-09-05 12:26                   ` Milian Wolff [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2027973.spD7XaVQdP@milian-kdab2 \
    --to=milian.wolff@kdab.com \
    --cc=acme@kernel.org \
    --cc=andi@firstfloor.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-perf-users@vger.kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox