Re: Trying to understand some strange samples in a perf profile

linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Milian Wolff <milian.wolff@kdab.com>
To: Hadrien Grasland <grasland@lal.in2p3.fr>
Cc: linux-perf-users@vger.kernel.org
Subject: Re: Trying to understand some strange samples in a perf profile
Date: Thu, 21 Dec 2017 13:47:21 +0100	[thread overview]
Message-ID: <2337765.oK3dqqyzHo@milian-kdab2> (raw)
In-Reply-To: <6d2adf7a-b181-a8ed-2069-6063abadf6a1@lal.in2p3.fr>

[-- Attachment #1: Type: text/plain, Size: 9755 bytes --]

On Thursday, December 21, 2017 11:46:15 AM CET Hadrien Grasland wrote:
> Hi Milian!

Hey, thanks for the answers. I'll snip some parts out so we don't get lost in 
the long mail.

<snip>

> >> $ perf record -a --call-graph=dwarf,65528 -F 1000 -o /dev/shm/perf.data
> >> basf2 02_fit.py -i simulated.root && mv /dev/shm/perf.data /tmp/perf.data
> > 
> > Why do you run record with `-a`? basf2 launches your simulation, right? So
> > just recording it and its children should be enough for you, no?
> 
> I like to run with -a as a default, because it gives me a broader
> picture by telling me about the background system activity during a
> measurement: other CPU-hungry applications which I forgot to close (I
> unfortunately cannot afford the luxury of a dedicated node for
> performance measurements), the overhead of perf itself when I get too
> greedy with DWARF call graphs, etc.

OK, interesting.

> IIRC, running with -a is also necessary to get a representative picture
> of kernel-side overhead in applications that perform a lot of syscalls.
> I was recently experimenting with a system monitoring library that would
> put some heavy load on procfs, and from memory the perf record/report
> picture was totally different in "normal" mode and with -a on.

When you sample on cycles without limiting yourself to userspace (`:u`), then 
you'll also see kernel-space cost. But if it's async and a context switch 
happens, then of course you wouldn't see it. I usually handle this by doing 
sleep-time profiling, but that is currently not as easy as I'd like it to be. 
Something I'm working on for the hotspot GUI though, hopefully I'll have 
something available in January.

> However, you are correct that I do not need it any of these capabilities
> here, and that it introduces unnecessary complexity in our discussion,
> so I re-did a run without it.

You raise valid points, I was merely interested in why you chose `-a` here.

> With that change...
> 
>   * I don't get any samples without a stack trace anymore
>   * Obviously the nvidia samples are gone as well (the way it should be,
>     since this code is not supposed to do graphics)
>   * I still get 1.6% of stack traces ending with ffffffffffffffff
>   * I still get a negligible amount of /tmp/perf-<PID>.map samples
>     (expected, the ROOT toolkit from HEP does some C++ JIT compilation
>     for persistence of arbitrary objects)
>   * I still get 10.6% of stack traces ending in the "[unknown]" DSO for
>     no clear reason.
>   * I still get 0.6% of stack traces ending abruptly.
> 
> Obviously, the behaviour which I am most interesting in is the one that
> introduces the highest bias in my profile (~11% starts to be enough to
> bias the conclusions of a performance analysis if it is all grouped in a
> single place), so it is fine by me if we focus on that one :)

Yep.

<snip>

> >> ~ 1.6% of stack traces end with the ffffffffffffffff pointer. I had a
> >> little chat about those with Jiri on IRC, who suggested that a common
> >> reason for this is when the stack sample is not large enough. It
> >> certainly seems true for some samples, but I am not sure if this is
> >> always the right explanation as I get it even for very short stacks like
> >> the following one:
> >> 
> >> .   basf2  3272 [007]   300.526376:    3571762 cycles:uppp:
> >> .   	           27be5 __ieee754_pow_sse2 (/lib64/libm-2.26.so)
> >> .   	ffffffffffffffff [unknown] ([unknown])
> > 
> > Do you have debug symbols installed for libm/libc? Do you see other frames
> > where unwinding through __ieee754_pow_sse2 works properly? I've personally
> > also encountered issues similar to this one, and haven't found an
> > explanation. I guess there can be many problems, my current hunch is
> > buggy DWARF interpretation by libunwind, or alternatively wrong/limited
> > information emitted by the compiler.
> 
> I have debug symbols for libm/libc, and also for the other DSOs which
> produce similarly broken stacks (libstdc++, ROOT, libpython, Geant4...).
> 
> If there is anything I can do to help you with investigating that
> anomaly, be my guest :)

I guess this is the same problem as you are seeing below then.

<snip>

> >> Now, these are the anomalous samples which I think I understand. But as
> >> you can see, they are far from accounting for all of my 16% of "missing"
> >> samples. For the remaining ones, I can explain ~2.8% as originating from
> >> other system activity (other running programs, syscalls, etc), but that
> >> still leaves a good 10.2% of samples for which I have absolutely no idea
> >> what's going on.
> >> 
> >> In the majority of the cases (~9.7%), the stack traces just suddenly go
> >> someplace weird for no apparent reason:
> >> 
> >> .   basf2  3272 [007]   300.560379:    3571034 cycles:uppp:
> >> .   	          1a7036 G4PolyconeSide::DistanceAway
> >> (/home/hadrien/Software/Belle2/externals/development/Linux_x86_64/opt/lib
> >> /l
> >> ibG4geometry.so) .   	               1 [unknown] ([unknown])
> >> 
> >> While in the remaining cases (~0.6%), the stack traces start fine but
> >> end up rather abruptly:
> >> 
> >> .   basf2  3272 [007]   300.541377:    3571664 cycles:uppp:
> >> .   	            e4a9 
Belle2::G4SafeNavigator::LocateGlobalPointAndSetup
> >> (/home/hadrien/Software/Belle2/tests-release/modules/Linux_x86_64/opt/lib
> >> ge
> >> nfitUtilities.so)
> > 
> > In both cases, do you have full dwarf information available for these
> > libraries? What compiler did you use for compilation? How deep into the
> > stack trace does this happen?
> 
> I have the DWARF information for these libraries. Since I have had to
> build some of them myself, I thought it might have been a build system
> misconfiguration, but as far as I know ELF debugging sections never lie:

<snip>

> Everything was compiled using GCC 7.2.0, and as you can see in the perf
> script samples which I provided you in my initial mail, the problem
> tends to happen very early in the stack trace, typically at the first or
> second frame.

The sections are there, the question is whether the information on the given 
address is there. See also below for more.

> >> Can you help me check if my understanding of the other samples is
> >> correct, and give me some clues about what could happen for these last
> >> few weird ones?
> >> 
> >> I can provide perf.data files and archives on demand, but since the
> >> original is a rather big upload/download ( 2.7 GB ), I thought I would
> >> wait until you can give me some specific requests or a way to produce
> >> skimmed datasets with only the samples that you are looking for.
> > 
> > The perf.data file wouldn't be enough, we'd need access to all of the
> > so/dbg files too. perf archive could be used for that purpose, but I've
> > seen many situations where not all DSOs encountered had a build-id, and
> > thus where not put into the archive...
> 
> Yes, back when I discussed the ffffffffffffffff samples with Jiri on
> IRC, we had this problem that perf archive failed to include all
> necessary DSOs for analysis. So I'm not sure if it would work here.
> 
> If there is anything else which I can do to help you here, just tell me!

I never took the time to debug this, but I personally consider this a big 
problem with perf. These broken backtraces make analysis often much harder 
compared to others tools like VTune which apparently do not suffer from this 
issue. At least there, I have never seen it - maybe it's just less of a 
problem there for some reason.

So to debug this, I guess we need to sit down and actually find a way to 
reproduce the issue with a smaller test case. I actually have something that 
is much smaller than your code base and also shows this issue sometimes:

https://github.com/KDAB/hotspot/blob/master/tests/test-clients/cpp-inlining/
main.cpp

perf record --call-graph dwarf -c 1000000 ./a.out
..
cpp-inlining 25448 258803.904661:     100000 cycles:uppp: 
        ffffffff815fd197 [unknown] ([unknown])
..
cpp-inlining 25448 258803.907014:     100000 cycles:uppp: 
                   23783 __hypot_finite (/usr/lib/libm-2.26.so)
        40e3491ebf75199b [unknown] ([unknown])
..
cpp-inlining 25448 258804.312048:     100000 cycles:uppp: 
                   23783 __hypot_finite (/usr/lib/libm-2.26.so)
        40f204e9aa06ab79 [unknown] ([unknown])

Interestingly, the last broken state repeats, and the address in libm is 
always the same (0x23783). So that's why I believe that the problem might be 
in libunwind/libdw or in the DWARF emitter in GCC/Clang... We'd have to debug 
this. I guess for that we'd have to do something along the following:

- build a minimal "unwinder" application for libunwind e.g.
- notify libunwind about all mmap events
- install the register and stack state from the per.data file
- try to unwind, see what happens

This should give us a MWE that we could also show to Dave Watson and others 
from libunwind. Similarly, we can easily check what libdwfl does there and 
could show it to Mark Wielaard and others from elfutils. I'm pretty sure that 
together we can then figure out what to blame:

- are the register values correct?
- is the stack state correct?
- is the DWARF info correct?
- ...?

It sounds like a lot of work, but it's certainly doable. Fixing this issue 
would make me very happy, and I bet a lot of other users would appreciate this 
too. Right now, I often have to tell people who use perf "deal with broken 
backtraces, try to guesstimate from the surroundings what actually happened". 
But this is of course far from optimal!

Bye

-- 
Milian Wolff | milian.wolff@kdab.com | Senior Software Engineer
KDAB (Deutschland) GmbH, a KDAB Group company
Tel: +49-30-521325470
KDAB - The Qt, C++ and OpenGL Experts

[-- Attachment #2: smime.p7s --]
[-- Type: application/pkcs7-signature, Size: 3826 bytes --]

next prev parent reply	other threads:[~2017-12-21 12:47 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-20  9:39 Trying to understand some strange samples in a perf profile Hadrien Grasland
2017-12-21  9:54 ` Milian Wolff
2017-12-21 10:46   ` Hadrien Grasland
2017-12-21 12:47     ` Milian Wolff [this message]
2017-12-21 16:02       ` Hadrien Grasland
2017-12-27 22:21         ` Milian Wolff
2017-12-28 23:31       ` David Hinkle
2018-01-04 22:08         ` Milian Wolff

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2337765.oK3dqqyzHo@milian-kdab2 \
    --to=milian.wolff@kdab.com \
    --cc=grasland@lal.in2p3.fr \
    --cc=linux-perf-users@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).