linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Milian Wolff <milian.wolff@kdab.com>
To: Hadrien Grasland <grasland@lal.in2p3.fr>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: Trying to understand some strange samples in a perf profile
Date: Thu, 21 Dec 2017 10:54:29 +0100	[thread overview]
Message-ID: <9831756.pFeoWVMjH9@milian-kdab2> (raw)
In-Reply-To: <db0e4795-de77-ac92-6bab-4b4d451daf26@lal.in2p3.fr>

[-- Attachment #1: Type: text/plain, Size: 7363 bytes --]

On Wednesday, December 20, 2017 10:39:07 AM CET Hadrien Grasland wrote:
> Hi all,

Hey Hadrien!

> I am currently trying to get a reliable perf setup for a new codebase
> that I am working on. The software is rather big and complex (particle
> physics framework from the Belle 2 experiment), and architected in a way
> that makes it hard to break it down in smaller chunks, which brings some
> new issues that I do not usually encounter in my smaller software
> projects. I thought you might be able to help me understand some of the
> oddities that I observe in perf's output.
> 
> -----
> 
> I am running perf 4.14.3 on top of Linux 4.14.6 and a Haswell machine
> (Intel(R) Xeon(R) E5-1620 v3).

Is this a self-build perf, or whatever you get from openSUSE? Is it using 
libunwind for unwinding, or libdw? Try do to a `ldd` on `perf` to see what it 
links against.

> My Linux distribution, openSUSE
> Tumbleweed, has the nasty but sadly widespread habit of building
> everything using -fomit-frame-pointer, which makes FP-based call graphs
> impractical. And some parts of the code produce rather huge stack traces
> (> 100 frames, most likely originating from recursion) which preclude
> use of LBR-based profiling. So as far as I understand, my only option is
> to use DWARF with stack samples that are as large as possible ( 65528 ),
> and hope that the remaining truncated traces do not bias my profile too
> much. I estimated by analyzing the output of perf script (more about
> that below) that they are < 2%, so the impact shouldn't be too bad.
> 
> Since these big stack semples cause quite a lot of perf.data I/O, I keep
> the sampling rate reasonable (1 kHz) and store my perf.data on a ramdisk
> during sampling. With these final tweaks, my perf record command ends up
> looking like this:
> 
> $ perf record -a --call-graph=dwarf,65528 -F 1000 -o /dev/shm/perf.data
> basf2 02_fit.py -i simulated.root && mv /dev/shm/perf.data /tmp/perf.data

Why do you run record with `-a`? basf2 launches your simulation, right? So 
just recording it and its children should be enough for you, no?

> Due to the aforementioned huge stack traces, I also had to increase perf
> report's stack limit to avoid systematic statistical bias against large
> stack traces. I did not find an "unlimited" option, so I just made it
> large enough to encompass any possible stack trace:
> 
> $ perf report --max-stack=65535 -i /tmp/perf.data
> 
> -----
> 
> Now, my problem is that the aforementioned perf report command only
> attributes ~84% of the samples to the top-level function of the program
> (here __libc_start_main), which seems a little low to me. In an attempt
> to understand this, I started analyzing the perf script output, and
> noticed a number of anomalous samples which I would like to discuss
> further with you.
> 
> ~0.7% of samples have no stack trace at all. I'm not sure if this is
> expected, my guess was that this could possibly happen if the PMU
> interrupt fires at a time where the system is in an idle state. You tell
> me. In that case, perf script only displays a header like this:
> 
> .   irqbalance  1497 [001]   275.010584:    1320760 cycles:uppp:

This is probably a side-effect of you using `-a`, try removing it.

> ~ 1.6% of stack traces end with the ffffffffffffffff pointer. I had a
> little chat about those with Jiri on IRC, who suggested that a common
> reason for this is when the stack sample is not large enough. It
> certainly seems true for some samples, but I am not sure if this is
> always the right explanation as I get it even for very short stacks like
> the following one:
> 
> .   basf2  3272 [007]   300.526376:    3571762 cycles:uppp:
> .   	           27be5 __ieee754_pow_sse2 (/lib64/libm-2.26.so)
> .   	ffffffffffffffff [unknown] ([unknown])

Do you have debug symbols installed for libm/libc? Do you see other frames 
where unwinding through __ieee754_pow_sse2 works properly? I've personally 
also encountered issues similar to this one, and haven't found an explanation. 
I guess there can be many problems, my current hunch is buggy DWARF 
interpretation by libunwind, or alternatively wrong/limited information 
emitted by the compiler.

> However, it might be the case that some functions allocate very large
> stack objects, or that this invalid pointer is used in other cases. I
> welcome additional explanations here.
> 
> ~0.6% of samples look mostly normal, but end with a DSO like
> "/tmp/perf-<PID>.map". A web search suggests that this happens when perf
> detected JIT-compiled code, but did not get DWARF information from the
> JIT. And in most cases this explanation looks accurate to me as the next
> DSO in the stack is typically libmozjs or similar.
> 
> ~0.2% of samples are full of "[unknown]" symbols and end up somewhere in
> a DSO with "nvidia" in its name. I understand this as originating from
> the fact that DWARF call graphs cannot work well with proprietary code
> that does not have debug info, and that's fine by me considering the low
> amount of samples.

The proprietary nvidia libs have neither frame pointers, nor DWARF 
information, so unwinding is often impossible there and thus ends up in 
[unknown].

> Now, these are the anomalous samples which I think I understand. But as
> you can see, they are far from accounting for all of my 16% of "missing"
> samples. For the remaining ones, I can explain ~2.8% as originating from
> other system activity (other running programs, syscalls, etc), but that
> still leaves a good 10.2% of samples for which I have absolutely no idea
> what's going on.
> 
> In the majority of the cases (~9.7%), the stack traces just suddenly go
> someplace weird for no apparent reason:
> 
> .   basf2  3272 [007]   300.560379:    3571034 cycles:uppp:
> .   	          1a7036 G4PolyconeSide::DistanceAway
> (/home/hadrien/Software/Belle2/externals/development/Linux_x86_64/opt/lib/l
> ibG4geometry.so) .   	               1 [unknown] ([unknown])
> 
> While in the remaining cases (~0.6%), the stack traces start fine but
> end up rather abruptly:
> 
> .   basf2  3272 [007]   300.541377:    3571664 cycles:uppp:
> .   	            e4a9 Belle2::G4SafeNavigator::LocateGlobalPointAndSetup
> (/home/hadrien/Software/Belle2/tests-release/modules/Linux_x86_64/opt/libge
> nfitUtilities.so)

In both cases, do you have full dwarf information available for these 
libraries? What compiler did you use for compilation? How deep into the stack 
trace does this happen?

> Can you help me check if my understanding of the other samples is
> correct, and give me some clues about what could happen for these last
> few weird ones?
> 
> I can provide perf.data files and archives on demand, but since the
> original is a rather big upload/download ( 2.7 GB ), I thought I would
> wait until you can give me some specific requests or a way to produce
> skimmed datasets with only the samples that you are looking for.

The perf.data file wouldn't be enough, we'd need access to all of the so/dbg 
files too. perf archive could be used for that purpose, but I've seen many 
situations where not all DSOs encountered had a build-id, and thus where not 
put into the archive...

Bye

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3826 bytes --]

  reply	other threads:[~2017-12-21  9:54 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-20  9:39 Trying to understand some strange samples in a perf profile Hadrien Grasland
2017-12-21  9:54 ` Milian Wolff [this message]
2017-12-21 10:46   ` Hadrien Grasland
2017-12-21 12:47     ` Milian Wolff
2017-12-21 16:02       ` Hadrien Grasland
2017-12-27 22:21         ` Milian Wolff
2017-12-28 23:31       ` David Hinkle
2018-01-04 22:08         ` Milian Wolff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9831756.pFeoWVMjH9@milian-kdab2 \
    --to=milian.wolff@kdab.com \
    --cc=grasland@lal.in2p3.fr \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).